flowchart LR
B["Build <br> array(data, dim, dimnames)"] --> A["An Array"]
A --> I["Index <br> x[i, j, k] / x[i, , ] / negative / logical"]
A --> AX["Collapse <br> apply(x, MARGIN, FUN)"]
A --> R["Reshape <br> dim()<- / aperm()"]
style B fill:#e3f2fd,stroke:#1976D2
style A fill:#fff3e0,stroke:#F57C00
style I fill:#e8f5e9,stroke:#388E3C
style AX fill:#f3e5f5,stroke:#8E24AA
style R fill:#f3e5f5,stroke:#8E24AA
15 Arrays in R
This chapter introduces arrays, R’s general n-dimensional container for data that shares a single atomic type. A matrix is a 2-D array; an array can have any number of dimensions you like. You will learn how to build one with array(), how to name each axis with dimnames, how to slice it with x[i, j, k, ...], and how to collapse any axis down to a summary with apply(). You will see a three-way contingency-table use case that is impossible to express cleanly as a matrix, and you will learn the two safe ways to iterate over slices. By the end of this chapter you will know when to reach for an array over a matrix or a data frame and how to write code that generalises across any number of dimensions.
15.1 From Matrix to Array
Every cell of an array shares the same atomic type (double, integer, logical, character). The array’s shape is described by a dim attribute, which for a matrix has length 2 and for an array can have length 3, 4, or more.
| Dimensions | Shape Name | Built With |
|---|---|---|
| 1 | Vector | c() or numeric(n) |
| 2 | Matrix | matrix() or a 2-length dim |
| 3 | Cube / stack | array() with a 3-length dim |
| 4+ | Higher-dimensional array | array() with a longer dim |
A 2-D matrix has two axes: rows and columns. A 3-D array adds a third, typically “layer” or “time”. Higher-dimensional arrays add more axes still. Any operation you write should be expressed in terms of “which axes am I collapsing” and “which axes am I keeping”, rather than pictorial descriptions that only make sense in 2-D.
15.2 Building an Array
array() Takes Data, a Dimension Vector, and Optional Names
dimnames is a list with one element per axis, each either NULL or a character vector of the axis length. Named axes make the code self-documenting.
This is a classic three-way data cube: quarter × region × year.
If prod(dim) does not match the length of data, R recycles the data to fill the array, silently. Always double-check that the product of dimensions equals the length of the data you are passing in.
15.3 Indexing an Array
Indexing an array uses the same [ ] notation as a matrix, but with one index per dimension, separated by commas. Leaving an axis blank means “all of it”.
Negative indices exclude positions from the corresponding axis; logical vectors filter along an axis.
drop = FALSE to Keep the Shape
Like matrices, arrays collapse singleton axes by default. Use drop = FALSE to preserve the array class when you select a single layer.
15.4 Modifying an Array
15.5 Collapsing Dimensions with apply()
apply(x, MARGIN, FUN) applies FUN across the chosen margin(s) and collapses every other axis. MARGIN = 1 keeps axis 1 (rows in a matrix), MARGIN = 2 keeps axis 2, and so on. You can pass a vector of margins to keep more than one.
apply() Is How Arrays Earn Their Keep
A 3-way array is just numbers until you start collapsing it. Almost every useful calculation on an array is an apply() call with the right MARGIN argument: per-quarter totals, per-region means, per-year variability. Writing those as apply(a, 1, ...), apply(a, c(1, 2), ...), etc. is the array idiom.
15.6 Reshaping and Permuting Axes
dim()<- Reshapes, aperm() Rearranges
dim()<- changes the dimension vector in place; the total number of cells must stay the same. aperm() permutes the order of axes, for example, swapping rows and columns in a matrix or moving the year axis to the front of a 3-way array.
15.7 A 3-Way Contingency Table
Contingency tables count how often each combination of categorical variables occurs. A 2-way table (like “treatment x outcome”) is a matrix; a 3-way table (like “treatment x outcome x site”) is a 3-D array. R’s table() function happily returns one.
Most analysts today store the same information in a long data frame (one row per observation with columns gender, age_band, outcome, and a count) because the tidyverse toolchain is built around that shape. Use an array when you genuinely need fast n-D numeric work or when a statistical function explicitly returns one (many hypothesis tests and image-processing functions do).
15.8 Iterating Over Array Slices
When you need to step through layers of an array, for example, running the same analysis on every year of sales data, combine apply() for the “collapse” case with a plain for loop or lapply() for the “keep each slice” case.
15.9 A Worked Example: Year-over-Year Growth
Every technique from the chapter shows up: construction with named dimnames, slab indexing with character names, element-wise arithmetic between two slabs, colSums() on a slab, and apply() with MARGIN = c(1, 2) to collapse the year axis.
15.10 Summary
| Concept | Key Takeaway |
|---|---|
| Array vs matrix | A matrix is a 2-D array; arrays generalise to 3 or more dimensions. |
| Homogeneous | Every cell shares one atomic type. |
array() |
Takes data, dim, and optionally dimnames. |
| Indexing | x[i, j, k, ...]; leaving an axis blank means “all”. |
drop = FALSE |
Preserves the array class when selecting a single layer. |
apply(x, MARGIN, FUN) |
Collapse every axis except the ones listed in MARGIN. |
| Reshape vs permute | dim(x) <- ... reshapes; aperm(x, perm) reorders the axes. |
| Contingency tables | table() returns an n-D array; collapse with apply(). |
| When to use | Reach for an array for genuinely n-D numeric work; otherwise prefer a data frame. |
Arrays are a specialist’s tool. Most day-to-day analysis lives in data frames, and most numeric grids fit in a matrix. When you do need a third axis, time, site, trial, colour channel, arrays let you express the calculation in the language of axes rather than in nested loops. This chapter closes Module 2 on data structures. Module 3 turns to descriptive analytics, starting with how R handles text through character vectors and string operations.