10  Vector Operations: Sorting, Ordering, Recycling, and Missing Values

NoteWhat This Chapter Covers

This chapter adds four essential tools to your vector vocabulary. You will learn how to sort a vector into order with sort(), and the subtler companion function order() that gives you the permutation you need to reorder several related vectors together. You will see how rev() reverses a vector and how rank() reports each element’s rank. You will meet R’s recycling rule: the silent repetition that happens when two vectors of different lengths meet an arithmetic operator. Finally you will learn how R represents and handles missing values, why NA + 1 is NA, how to detect and count NAs, how to drop them, and how to use the na.rm argument that appears throughout R. By the end of this chapter you will be able to clean, order, and combine vectors without surprises.

flowchart TD
    V["A Vector"] --> SORT["Reorder <br> sort() / rev() / order() / rank()"]
    V --> REC["Combine <br> arithmetic + recycling"]
    V --> NA["Handle Gaps <br> NA / is.na() / na.rm"]
    style V fill:#e3f2fd,stroke:#1976D2
    style SORT fill:#fff3e0,stroke:#F57C00
    style REC fill:#f3e5f5,stroke:#8E24AA
    style NA fill:#fbe9e7,stroke:#D84315


10.1 Sorting a Vector with sort()

NoteAscending by Default, Descending on Request

sort() returns a new vector with the elements in order. Numbers sort numerically, strings sort alphabetically.

Noterev() Reverses, Regardless of Order

rev() reverses whatever order a vector is currently in. It is not a sort; pairing rev(sort(x)) is a common idiom for “largest first”.


10.2 order(): the Permutation, Not the Result

NoteThe Function That Tells You “Who Goes Where”

sort() returns the sorted values. order() returns the indices you would need to pull elements from the original vector to get a sorted result. That sounds abstract until you realise it is exactly what you need when two or more vectors must stay aligned.

NoteDescending and Multi-Key Order

order() accepts decreasing = TRUE just like sort(), and passing multiple vectors to it uses them as tie-breakers.

TipExpert Insight: sort() for Display, order() for Analysis

Use sort() when all you want is the ordered values on their own. Use order() whenever the values are paired with other variables, the classic case being a data frame where you want to reorder rows. Learning to reach for order() instead of sort() in those situations removes a whole category of beginner bugs.


10.3 rank(): Where Does Each Element Stand?

NoteThe Inverse Question

Where order() asks “which element should go first?”, rank() asks “what position does each element already occupy?”. The result is a vector of ranks aligned with the original input.

NoteHandling Ties

rank() has a ties.method argument that decides how tied values share a rank. The default is "average"; common alternatives are "min", "max", and "first".


10.4 The Recycling Rule

NoteCore Concept: Shorter Vectors Are Repeated

When an arithmetic or logical operation involves two vectors of different lengths, R silently repeats the shorter one until its length matches the longer one. This is the recycling rule, and it is the reason x + 1 works: the length-1 vector 1 is recycled to match the length of x.

WarningWhen the Longer Length Is Not a Multiple

If the longer length is not a whole-number multiple of the shorter length, R still performs the operation but emits a warning. The answer is almost always wrong.

TipBest Practice: Match Lengths on Purpose

Recycling is a superpower when used deliberately (e.g. subtracting a mean from every element) and a hazard when it happens by accident. Before combining two vectors with an operator, either check length(a) == length(b) or confirm that one of them is deliberately of length 1.


10.5 Missing Values: NA in Vectors

NoteCore Concept: NA Is Contagious

R uses the special value NA to mean “missing data”. Almost every arithmetic or logical operation involving NA produces NA, because the true answer is unknown.

NoteDetecting and Counting Missing Values

Use is.na() to ask, element by element, “is this value missing?”. The result is a logical vector, which combines with the other logical tools you have met.

WarningCommon Mistake: x == NA Does Not Work

NA represents “unknown”. Asking whether something equals an unknown value is itself unknown, so NA == NA returns NA, not TRUE. Always use is.na().


10.6 Excluding or Summarising Around Missing Values

NoteTwo Ways to Cope with NA
Approach How It Looks
Drop them before summarising. x[!is.na(x)]
Tell the summariser to skip them with na.rm = TRUE. sum(x, na.rm = TRUE)
Notena.omit() for a Compact Dropper

na.omit(x) returns x with all NAs removed. It also attaches an attribute recording which positions were dropped, which some modelling functions use.

TipBest Practice: Decide About NA Consciously, Not Silently

Dropping a missing value is a decision, not a default. If 30% of a column is missing and you silently pass na.rm = TRUE, your “average customer age” is an average of the customers who bothered to fill that field in, which is rarely what you actually want to report. Always report the count of NAs alongside the summary and ask whether dropping them is appropriate for the question at hand.


10.7 NA and Comparisons in Filtering

NoteFiltering Out NA Correctly

A logical vector with NAs inside it produces NA rows when used for subsetting, which almost always surprises beginners. The fix is !is.na() or the combining helper which().


10.8 A Worked Example: Cleaning and Ranking a Set of Scores

NotePutting the Four Tools Together

Every building block from this chapter appears: is.na() to locate gaps, na.rm = TRUE for safe summaries, order() with a negative sign for descending order, and recycling to add a constant to a whole vector.


10.9 Summary

NoteKey Concepts at a Glance
Concept Key Takeaway
sort() Returns values in order. decreasing = TRUE reverses it.
rev() Reverses whatever order a vector has.
order() Returns the permutation that would sort the input; essential for reordering several aligned vectors together.
rank() Returns each element’s rank in the input; ties.method decides tie-breaking.
Recycling rule Shorter vectors are repeated to match the longer one; a mismatched multiple warns.
NA is contagious Almost every operation with NA returns NA.
Detecting NA Always use is.na(x); x == NA never works.
Excluding NA Drop with x[!is.na(x)] or pass na.rm = TRUE to summary functions.
Filtering with NA Combine !is.na(x) into the condition, or use which() to skip NAs.
TipApplying This in Practice

Sorting, recycling, and NA-handling are the three places where silent bugs are most likely to enter a vector-based analysis. Make it a habit to check lengths before combining vectors with operators, to use order() when you need several columns to stay in sync, and to think explicitly about missing values before reaching for na.rm = TRUE. In the next chapter you will see how arithmetic, relational, and logical operators behave on vectors in more depth, including element-wise versus scalar behaviour and how comparison chains compose.