flowchart TD
V["A Vector"] --> SORT["Reorder <br> sort() / rev() / order() / rank()"]
V --> REC["Combine <br> arithmetic + recycling"]
V --> NA["Handle Gaps <br> NA / is.na() / na.rm"]
style V fill:#e3f2fd,stroke:#1976D2
style SORT fill:#fff3e0,stroke:#F57C00
style REC fill:#f3e5f5,stroke:#8E24AA
style NA fill:#fbe9e7,stroke:#D84315
10 Vector Operations: Sorting, Ordering, Recycling, and Missing Values
This chapter adds four essential tools to your vector vocabulary. You will learn how to sort a vector into order with sort(), and the subtler companion function order() that gives you the permutation you need to reorder several related vectors together. You will see how rev() reverses a vector and how rank() reports each element’s rank. You will meet R’s recycling rule: the silent repetition that happens when two vectors of different lengths meet an arithmetic operator. Finally you will learn how R represents and handles missing values, why NA + 1 is NA, how to detect and count NAs, how to drop them, and how to use the na.rm argument that appears throughout R. By the end of this chapter you will be able to clean, order, and combine vectors without surprises.
10.1 Sorting a Vector with sort()
sort() returns a new vector with the elements in order. Numbers sort numerically, strings sort alphabetically.
rev() Reverses, Regardless of Order
rev() reverses whatever order a vector is currently in. It is not a sort; pairing rev(sort(x)) is a common idiom for “largest first”.
10.2 order(): the Permutation, Not the Result
sort() returns the sorted values. order() returns the indices you would need to pull elements from the original vector to get a sorted result. That sounds abstract until you realise it is exactly what you need when two or more vectors must stay aligned.
order() accepts decreasing = TRUE just like sort(), and passing multiple vectors to it uses them as tie-breakers.
sort() for Display, order() for Analysis
Use sort() when all you want is the ordered values on their own. Use order() whenever the values are paired with other variables, the classic case being a data frame where you want to reorder rows. Learning to reach for order() instead of sort() in those situations removes a whole category of beginner bugs.
10.3 rank(): Where Does Each Element Stand?
Where order() asks “which element should go first?”, rank() asks “what position does each element already occupy?”. The result is a vector of ranks aligned with the original input.
rank() has a ties.method argument that decides how tied values share a rank. The default is "average"; common alternatives are "min", "max", and "first".
10.4 The Recycling Rule
When an arithmetic or logical operation involves two vectors of different lengths, R silently repeats the shorter one until its length matches the longer one. This is the recycling rule, and it is the reason x + 1 works: the length-1 vector 1 is recycled to match the length of x.
If the longer length is not a whole-number multiple of the shorter length, R still performs the operation but emits a warning. The answer is almost always wrong.
Recycling is a superpower when used deliberately (e.g. subtracting a mean from every element) and a hazard when it happens by accident. Before combining two vectors with an operator, either check length(a) == length(b) or confirm that one of them is deliberately of length 1.
10.5 Missing Values: NA in Vectors
NA Is Contagious
R uses the special value NA to mean “missing data”. Almost every arithmetic or logical operation involving NA produces NA, because the true answer is unknown.
Use is.na() to ask, element by element, “is this value missing?”. The result is a logical vector, which combines with the other logical tools you have met.
x == NA Does Not Work
NA represents “unknown”. Asking whether something equals an unknown value is itself unknown, so NA == NA returns NA, not TRUE. Always use is.na().
10.6 Excluding or Summarising Around Missing Values
| Approach | How It Looks |
|---|---|
| Drop them before summarising. | x[!is.na(x)] |
Tell the summariser to skip them with na.rm = TRUE. |
sum(x, na.rm = TRUE) |
na.omit() for a Compact Dropper
na.omit(x) returns x with all NAs removed. It also attaches an attribute recording which positions were dropped, which some modelling functions use.
Dropping a missing value is a decision, not a default. If 30% of a column is missing and you silently pass na.rm = TRUE, your “average customer age” is an average of the customers who bothered to fill that field in, which is rarely what you actually want to report. Always report the count of NAs alongside the summary and ask whether dropping them is appropriate for the question at hand.
10.7 NA and Comparisons in Filtering
NA Correctly
A logical vector with NAs inside it produces NA rows when used for subsetting, which almost always surprises beginners. The fix is !is.na() or the combining helper which().
10.8 A Worked Example: Cleaning and Ranking a Set of Scores
Every building block from this chapter appears: is.na() to locate gaps, na.rm = TRUE for safe summaries, order() with a negative sign for descending order, and recycling to add a constant to a whole vector.
10.9 Summary
| Concept | Key Takeaway |
|---|---|
sort() |
Returns values in order. decreasing = TRUE reverses it. |
rev() |
Reverses whatever order a vector has. |
order() |
Returns the permutation that would sort the input; essential for reordering several aligned vectors together. |
rank() |
Returns each element’s rank in the input; ties.method decides tie-breaking. |
| Recycling rule | Shorter vectors are repeated to match the longer one; a mismatched multiple warns. |
NA is contagious |
Almost every operation with NA returns NA. |
Detecting NA |
Always use is.na(x); x == NA never works. |
Excluding NA |
Drop with x[!is.na(x)] or pass na.rm = TRUE to summary functions. |
Filtering with NA |
Combine !is.na(x) into the condition, or use which() to skip NAs. |
Sorting, recycling, and NA-handling are the three places where silent bugs are most likely to enter a vector-based analysis. Make it a habit to check lengths before combining vectors with operators, to use order() when you need several columns to stay in sync, and to think explicitly about missing values before reaching for na.rm = TRUE. In the next chapter you will see how arithmetic, relational, and logical operators behave on vectors in more depth, including element-wise versus scalar behaviour and how comparison chains compose.