24  Iterating Loops Over Data Structures

NoteWhat this chapter covers

The previous chapter introduced the loop forms. This one shows how to walk over R’s main containers, vectors, lists, matrices, data frames, using both hand-written for loops and the apply family (apply, sapply, lapply, vapply, mapply). You’ll see why the apply functions are usually preferred, when a for loop still wins, and how to choose the right helper for the shape of the data you’re processing.

24.1 A common shape, iterate, transform, collect

Almost every loop in data work has the same shape:

  1. step through a container,
  2. compute something for each element,
  3. collect the results into a new structure.

R offers two ways to express this shape:

  • Explicit: a for loop into a pre-allocated output.
  • Implicit: an apply function that bundles iteration + collection into one call.

We’ll see both, but in idiomatic R the apply family is the default.

24.2 Iterating over vectors

A for loop works element-by-element:

The vectorised one-liner does the same thing in one line:

When the operation is per-element and already vectorised, don’t loop. Use a loop only when each step depends on the previous one or has a side effect.

24.3 Iterating over lists

Lists hold heterogeneous elements, so a per-element transform is a real iteration job. Here’s the loop form:

The same with sapply() is a single line:

sapply() walks the list, applies mean() to each element, and simplifies the result to a named numeric vector. Names are preserved automatically.

24.4 The apply family at a glance

Function Input Output Use when
lapply(x, f) vector / list list (always) safe default, even outputs vary in shape
sapply(x, f) vector / list vector / matrix if all results have the same shape; otherwise list quick interactive use
vapply(x, f, FUN.VALUE) vector / list vector / matrix matching FUN.VALUE production code, type-safe
apply(m, MARGIN, f) matrix / data frame vector / matrix collapse rows or columns
mapply(f, …) several vectors vector / matrix iterate over parallel arguments
Map(f, …) several lists list like mapply but always a list

Three of these, lapply, sapply, vapply, do the same job and differ only in what they return. Pick by how predictable the output is.

24.5 lapply, always returns a list

lapply() is the safe default. It returns a list of the same length as the input, regardless of what the function returns.

A list back means no surprises, lapply never tries to be clever about merging the outputs.

24.6 sapply, simplify when possible

sapply() calls lapply() and then tries to simplify:

  • if every result is a single value → returns a vector
  • if every result is the same length > 1 → returns a matrix
  • otherwise → falls back to a list (just like lapply)

Convenient interactively, risky in scripts: if one element happens to return a different shape, your code’s output type silently changes.

24.7 vapply, type-safe simplify

vapply() is sapply() plus a contract: you state up front what one result should look like, and R errors if any iteration disagrees. Use it when the type matters.

The FUN.VALUE = numeric(1) template says “every result must be a length-1 numeric.” If any iteration returned, say, an integer or a vector, vapply() would stop with a clear error instead of silently shifting type.

TipProduction rule of thumb
  • Quick exploration → sapply().
  • Code that anyone else (or future-you) will run → vapply().
  • Outputs vary in shape → lapply().

24.8 apply, for matrices and data frames

apply() walks along one dimension of a matrix and collapses the other. The MARGIN argument is 1 for rows, 2 for columns.

For the common cases, sums and means of rows or columns, the dedicated helpers rowSums(), colSums(), rowMeans(), colMeans() are faster and clearer. Reach for apply() when the function isn’t one of those.

apply() also works on data frames, but it coerces them to matrices first, so all columns must be of compatible type, otherwise everything becomes character. For data frames, prefer column-wise iteration with lapply()/sapply().

24.9 Iterating over a data frame’s columns

Because a data frame is a list of columns, lapply() and sapply() walk over its columns by default:

To filter to numeric columns first:

24.10 Iterating over rows of a data frame

Row-wise iteration in base R is unusual, most analyses are column-wise. When you need it, two patterns:

By index with a for loop:

With apply(df, 1, …), but remember the matrix-coercion gotcha: every column will be turned into character if any column is non-numeric.

For serious row-wise work in modern R, use dplyr::rowwise() or split the frame with split() then lapply().

24.11 mapply and Map, parallel iteration

mapply() is sapply() with multiple inputs walked in parallel, index 1 of every argument, then index 2, and so on.

For the same operation, the vectorised prices * qty is shorter, mapply() shines when the per-element function does something genuinely non-vectorisable.

Map() is mapply() without simplification, it always returns a list, the way lapply() does for one input.

24.12 Anonymous functions

You don’t need to name a function to pass it to an apply call. Two equivalent shorthands:

The \(x) … form is just sugar for function(x) … and reads cleanly inside an apply call.

24.13 Worked example, exam summary by subject

Five students, three subjects each. Compute per-subject mean and standard deviation, classify each subject as “tight” or “wide” based on the standard deviation, and label every individual score as Pass/Fail.

Three different iteration patterns in one example:

  • sapply() to collapse each column to a single number,
  • ifelse() over the resulting vector to derive a label,
  • lapply() to apply a per-column transform that returns a vector of the same length.

That toolbox handles the vast majority of descriptive-analytics work.

24.14 Summary

Task Best tool
Per-element transform on a vector vectorised arithmetic, x^2, log(x)
Per-element transform on a list lapply(), sapply(), or vapply()
Collapse each column to one value sapply(df, fun)
Collapse rows / columns of a matrix rowSums, colMeans, apply(m, MARGIN, fun)
Walk two vectors in parallel mapply() for vector output, Map() for list output
True step-depends-on-previous logic hand-written for loop

Two principles tie it all together. First, in R the iteration is usually a one-liner, reach for sapply or vapply before writing for. Second, choose the apply variant by the shape of the output you want: list (lapply), best-guess simplification (sapply), strict type contract (vapply).

The next two chapters formalise the function, the building block that the apply family applies, first the basics, then closures and recursion.