15  Arrays in R

NoteWhat This Chapter Covers

This chapter introduces arrays, R’s general n-dimensional container for data that shares a single atomic type. A matrix is a 2-D array; an array can have any number of dimensions you like. You will learn how to build one with array(), how to name each axis with dimnames, how to slice it with x[i, j, k, ...], and how to collapse any axis down to a summary with apply(). You will see a three-way contingency-table use case that is impossible to express cleanly as a matrix, and you will learn the two safe ways to iterate over slices. By the end of this chapter you will know when to reach for an array over a matrix or a data frame and how to write code that generalises across any number of dimensions.

flowchart LR
    B["Build <br> array(data, dim, dimnames)"] --> A["An Array"]
    A --> I["Index <br> x[i, j, k] / x[i, , ] / negative / logical"]
    A --> AX["Collapse <br> apply(x, MARGIN, FUN)"]
    A --> R["Reshape <br> dim()<- / aperm()"]
    style B fill:#e3f2fd,stroke:#1976D2
    style A fill:#fff3e0,stroke:#F57C00
    style I fill:#e8f5e9,stroke:#388E3C
    style AX fill:#f3e5f5,stroke:#8E24AA
    style R fill:#f3e5f5,stroke:#8E24AA


15.1 From Matrix to Array

NoteCore Concept: Any Number of Dimensions, One Type

Every cell of an array shares the same atomic type (double, integer, logical, character). The array’s shape is described by a dim attribute, which for a matrix has length 2 and for an array can have length 3, 4, or more.

Dimensions Shape Name Built With
1 Vector c() or numeric(n)
2 Matrix matrix() or a 2-length dim
3 Cube / stack array() with a 3-length dim
4+ Higher-dimensional array array() with a longer dim
TipExpert Insight: Think in Axes

A 2-D matrix has two axes: rows and columns. A 3-D array adds a third, typically “layer” or “time”. Higher-dimensional arrays add more axes still. Any operation you write should be expressed in terms of “which axes am I collapsing” and “which axes am I keeping”, rather than pictorial descriptions that only make sense in 2-D.


15.2 Building an Array

Notearray() Takes Data, a Dimension Vector, and Optional Names
NoteNaming the Axes

dimnames is a list with one element per axis, each either NULL or a character vector of the axis length. Named axes make the code self-documenting.

This is a classic three-way data cube: quarter × region × year.

WarningCommon Mistake: Mismatched Data Length

If prod(dim) does not match the length of data, R recycles the data to fill the array, silently. Always double-check that the product of dimensions equals the length of the data you are passing in.


15.3 Indexing an Array

NoteThe Bracket Pattern: One Index Per Axis

Indexing an array uses the same [ ] notation as a matrix, but with one index per dimension, separated by commas. Leaving an axis blank means “all of it”.

NoteNegative and Logical Indexing

Negative indices exclude positions from the corresponding axis; logical vectors filter along an axis.

WarningCommon Mistake: drop = FALSE to Keep the Shape

Like matrices, arrays collapse singleton axes by default. Use drop = FALSE to preserve the array class when you select a single layer.


15.4 Modifying an Array

NoteAssignment Works the Same Way

15.5 Collapsing Dimensions with apply()

NoteCore Concept: Pick the Axes You Keep

apply(x, MARGIN, FUN) applies FUN across the chosen margin(s) and collapses every other axis. MARGIN = 1 keeps axis 1 (rows in a matrix), MARGIN = 2 keeps axis 2, and so on. You can pass a vector of margins to keep more than one.

TipExpert Insight: apply() Is How Arrays Earn Their Keep

A 3-way array is just numbers until you start collapsing it. Almost every useful calculation on an array is an apply() call with the right MARGIN argument: per-quarter totals, per-region means, per-year variability. Writing those as apply(a, 1, ...), apply(a, c(1, 2), ...), etc. is the array idiom.


15.6 Reshaping and Permuting Axes

Notedim()<- Reshapes, aperm() Rearranges

dim()<- changes the dimension vector in place; the total number of cells must stay the same. aperm() permutes the order of axes, for example, swapping rows and columns in a matrix or moving the year axis to the front of a 3-way array.


15.7 A 3-Way Contingency Table

NoteWhen a Matrix Is Not Enough

Contingency tables count how often each combination of categorical variables occurs. A 2-way table (like “treatment x outcome”) is a matrix; a 3-way table (like “treatment x outcome x site”) is a 3-D array. R’s table() function happily returns one.

TipBest Practice: Reach for a Data Frame First

Most analysts today store the same information in a long data frame (one row per observation with columns gender, age_band, outcome, and a count) because the tidyverse toolchain is built around that shape. Use an array when you genuinely need fast n-D numeric work or when a statistical function explicitly returns one (many hypothesis tests and image-processing functions do).


15.8 Iterating Over Array Slices

NoteLooping Safely Across an Axis

When you need to step through layers of an array, for example, running the same analysis on every year of sales data, combine apply() for the “collapse” case with a plain for loop or lapply() for the “keep each slice” case.


15.9 A Worked Example: Year-over-Year Growth

NotePutting the Array Tools Together

Every technique from the chapter shows up: construction with named dimnames, slab indexing with character names, element-wise arithmetic between two slabs, colSums() on a slab, and apply() with MARGIN = c(1, 2) to collapse the year axis.


15.10 Summary

NoteKey Concepts at a Glance
Concept Key Takeaway
Array vs matrix A matrix is a 2-D array; arrays generalise to 3 or more dimensions.
Homogeneous Every cell shares one atomic type.
array() Takes data, dim, and optionally dimnames.
Indexing x[i, j, k, ...]; leaving an axis blank means “all”.
drop = FALSE Preserves the array class when selecting a single layer.
apply(x, MARGIN, FUN) Collapse every axis except the ones listed in MARGIN.
Reshape vs permute dim(x) <- ... reshapes; aperm(x, perm) reorders the axes.
Contingency tables table() returns an n-D array; collapse with apply().
When to use Reach for an array for genuinely n-D numeric work; otherwise prefer a data frame.
TipApplying This in Practice

Arrays are a specialist’s tool. Most day-to-day analysis lives in data frames, and most numeric grids fit in a matrix. When you do need a third axis, time, site, trial, colour channel, arrays let you express the calculation in the language of axes rather than in nested loops. This chapter closes Module 2 on data structures. Module 3 turns to descriptive analytics, starting with how R handles text through character vectors and string operations.