24 Iterating Loops Over Data Structures
The previous chapter introduced the loop forms. This one shows how to walk over R’s main containers, vectors, lists, matrices, data frames, using both hand-written for loops and the apply family (apply, sapply, lapply, vapply, mapply). You’ll see why the apply functions are usually preferred, when a for loop still wins, and how to choose the right helper for the shape of the data you’re processing.
24.1 A common shape, iterate, transform, collect
Almost every loop in data work has the same shape:
- step through a container,
- compute something for each element,
- collect the results into a new structure.
R offers two ways to express this shape:
- Explicit: a
forloop into a pre-allocated output. - Implicit: an apply function that bundles iteration + collection into one call.
We’ll see both, but in idiomatic R the apply family is the default.
24.2 Iterating over vectors
A for loop works element-by-element:
The vectorised one-liner does the same thing in one line:
When the operation is per-element and already vectorised, don’t loop. Use a loop only when each step depends on the previous one or has a side effect.
24.3 Iterating over lists
Lists hold heterogeneous elements, so a per-element transform is a real iteration job. Here’s the loop form:
The same with sapply() is a single line:
sapply() walks the list, applies mean() to each element, and simplifies the result to a named numeric vector. Names are preserved automatically.
24.4 The apply family at a glance
| Function | Input | Output | Use when |
|---|---|---|---|
lapply(x, f) |
vector / list | list (always) | safe default, even outputs vary in shape |
sapply(x, f) |
vector / list | vector / matrix if all results have the same shape; otherwise list | quick interactive use |
vapply(x, f, FUN.VALUE) |
vector / list | vector / matrix matching FUN.VALUE |
production code, type-safe |
apply(m, MARGIN, f) |
matrix / data frame | vector / matrix | collapse rows or columns |
mapply(f, …) |
several vectors | vector / matrix | iterate over parallel arguments |
Map(f, …) |
several lists | list | like mapply but always a list |
Three of these, lapply, sapply, vapply, do the same job and differ only in what they return. Pick by how predictable the output is.
24.5 lapply, always returns a list
lapply() is the safe default. It returns a list of the same length as the input, regardless of what the function returns.
A list back means no surprises, lapply never tries to be clever about merging the outputs.
24.6 sapply, simplify when possible
sapply() calls lapply() and then tries to simplify:
- if every result is a single value → returns a vector
- if every result is the same length > 1 → returns a matrix
- otherwise → falls back to a list (just like
lapply)
Convenient interactively, risky in scripts: if one element happens to return a different shape, your code’s output type silently changes.
24.7 vapply, type-safe simplify
vapply() is sapply() plus a contract: you state up front what one result should look like, and R errors if any iteration disagrees. Use it when the type matters.
The FUN.VALUE = numeric(1) template says “every result must be a length-1 numeric.” If any iteration returned, say, an integer or a vector, vapply() would stop with a clear error instead of silently shifting type.
- Quick exploration →
sapply(). - Code that anyone else (or future-you) will run →
vapply(). - Outputs vary in shape →
lapply().
24.8 apply, for matrices and data frames
apply() walks along one dimension of a matrix and collapses the other. The MARGIN argument is 1 for rows, 2 for columns.
For the common cases, sums and means of rows or columns, the dedicated helpers rowSums(), colSums(), rowMeans(), colMeans() are faster and clearer. Reach for apply() when the function isn’t one of those.
apply() also works on data frames, but it coerces them to matrices first, so all columns must be of compatible type, otherwise everything becomes character. For data frames, prefer column-wise iteration with lapply()/sapply().
24.9 Iterating over a data frame’s columns
Because a data frame is a list of columns, lapply() and sapply() walk over its columns by default:
To filter to numeric columns first:
24.10 Iterating over rows of a data frame
Row-wise iteration in base R is unusual, most analyses are column-wise. When you need it, two patterns:
By index with a for loop:
With apply(df, 1, …), but remember the matrix-coercion gotcha: every column will be turned into character if any column is non-numeric.
For serious row-wise work in modern R, use dplyr::rowwise() or split the frame with split() then lapply().
24.11 mapply and Map, parallel iteration
mapply() is sapply() with multiple inputs walked in parallel, index 1 of every argument, then index 2, and so on.
For the same operation, the vectorised prices * qty is shorter, mapply() shines when the per-element function does something genuinely non-vectorisable.
Map() is mapply() without simplification, it always returns a list, the way lapply() does for one input.
24.12 Anonymous functions
You don’t need to name a function to pass it to an apply call. Two equivalent shorthands:
The \(x) … form is just sugar for function(x) … and reads cleanly inside an apply call.
24.13 Worked example, exam summary by subject
Five students, three subjects each. Compute per-subject mean and standard deviation, classify each subject as “tight” or “wide” based on the standard deviation, and label every individual score as Pass/Fail.
Three different iteration patterns in one example:
sapply()to collapse each column to a single number,ifelse()over the resulting vector to derive a label,lapply()to apply a per-column transform that returns a vector of the same length.
That toolbox handles the vast majority of descriptive-analytics work.
24.14 Summary
| Task | Best tool |
|---|---|
| Per-element transform on a vector | vectorised arithmetic, x^2, log(x) |
| Per-element transform on a list | lapply(), sapply(), or vapply() |
| Collapse each column to one value | sapply(df, fun) |
| Collapse rows / columns of a matrix | rowSums, colMeans, apply(m, MARGIN, fun) |
| Walk two vectors in parallel | mapply() for vector output, Map() for list output |
| True step-depends-on-previous logic | hand-written for loop |
Two principles tie it all together. First, in R the iteration is usually a one-liner, reach for sapply or vapply before writing for. Second, choose the apply variant by the shape of the output you want: list (lapply), best-guess simplification (sapply), strict type contract (vapply).
The next two chapters formalise the function, the building block that the apply family applies, first the basics, then closures and recursion.