8 Role and Purpose of Data Structures in R

What This Chapter Covers

This chapter is a map of R’s core data structures and the reasons each one exists. You will see that R is unusual among programming languages in having data-analysis concepts (vectors, factors, data frames) baked into the language itself. You will learn the two dimensions along which every structure is classified, its shape (1-D, 2-D, or n-D) and whether it is homogeneous (all elements of one type) or heterogeneous (mixed types). You will meet each of the six everyday structures, vector, list, matrix, array, data frame, and factor, with a one-line definition, a typical use case, and a small example. You will also see a decision diagram that tells you which structure to reach for when a new kind of data shows up in your project. The individual structures are covered in depth in later chapters; this chapter exists so you know the neighbourhood before you move into any of the houses.

flowchart TD
    DS["R Data Structures"] --> D1["1-Dimensional"]
    DS --> D2["2-Dimensional"]
    DS --> DN["n-Dimensional"]

    D1 --> VEC["Vector <br> (homogeneous)"]
    D1 --> LST["List <br> (heterogeneous)"]
    D1 --> FAC["Factor <br> (categorical)"]

    D2 --> MAT["Matrix <br> (homogeneous)"]
    D2 --> DF["Data Frame <br> (heterogeneous)"]

    DN --> ARR["Array <br> (homogeneous, any #dims)"]
    classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

8.1 Why Data Structures Matter

Core Concept: Structure Shapes Behaviour

A data structure is the shape you give a collection of values so that the language can act on them efficiently. The same thousand numbers can be stored as a vector, a matrix, or a data frame column; each choice opens and closes particular operations. Arithmetic across columns is easy with a matrix but awkward with a list; mixing numbers and strings is natural in a data frame but impossible in a matrix. Picking the right structure up front is half of writing clean R code.

Expert Insight: R Was Designed Around Data, Not Objects

Most languages start from generic objects (records, classes) and bolt data-analysis features on top. R starts from the data-analysis features. Vectors, factors, and data frames are first-class citizens of the language, and the consequence is that idiomatic R reads like applied statistics rather than like software engineering. Understanding the structures is therefore understanding R.

8.2 The Two Dimensions of Classification

Homogeneous vs Heterogeneous, 1-D vs 2-D vs n-D

Every core R data structure can be placed in a two-by-three grid. One axis is shape: how many dimensions the structure has. The other axis is content: whether every element must share a type (homogeneous) or whether types can mix (heterogeneous).

Shape	Homogeneous	Heterogeneous
1-Dimensional	Vector	List
2-Dimensional	Matrix	Data Frame
n-Dimensional	Array	(rare, not built in)

Factor is a special case: it is one-dimensional and holds categorical labels, not arbitrary text. It is stored internally as integer codes with a lookup table, which makes it memory-efficient for repeated categories.

Common Confusion: “Everything Is a Vector, Right?”

R sometimes describes itself as “a language of vectors”. That is true in the sense that a single number is really a length-one vector. It is not true that every container is a vector. Lists, matrices, arrays, factors, and data frames each have their own class, their own print method, and their own rules for subsetting.

8.3 Vectors

Homogeneous, One-Dimensional, the Default Container

A vector is the simplest and most common structure in R. Every element has the same atomic type, and the whole collection is one-dimensional. You create one with c() (combine) or with a sequence generator like 1:10 or seq().

Try here

When to Reach for a Vector

Use a vector when you have a single column of data, a set of coordinates, a list of names, or any sequence of values of one type. Almost every function in R accepts vector input and returns vector output, which is why vectors are the default choice.

8.4 Lists

Heterogeneous, One-Dimensional, the Universal Container

A list is one-dimensional like a vector, but each element can be of any type, including another list. Lists are how R stores anything whose pieces do not all share a type: the result of a statistical test, the settings for a plot, or an arbitrary nested configuration.

Try here

When to Reach for a List

Use a list when the pieces of your data do not fit neatly into a single type, when you want to return multiple objects from a function, or when you need a hierarchical (nested) structure. Lists are also the underlying representation of many R objects, including data frames themselves.

8.5 Matrices

Homogeneous, Two-Dimensional, Organised into Rows and Columns

A matrix is a two-dimensional vector: every cell has the same atomic type, and the whole thing is laid out in a rectangle. Matrices support linear algebra, element-wise arithmetic, and fast row- or column-wise operations.

Try here

When to Reach for a Matrix

Use a matrix when the data is naturally rectangular and every value has the same type, distances between cities, correlation coefficients, covariance structures, grayscale images. Use a data frame instead when the columns have different types.

8.6 Data Frames

Heterogeneous, Two-Dimensional, the Workhorse of Analysis

A data frame looks like a table in a spreadsheet: rows are observations, columns are variables, and different columns can hold different types. Internally it is a list of equal-length vectors, one per column.

Try here

When to Reach for a Data Frame

Use a data frame for essentially every real-world tabular data set: survey responses, sales records, experimental results, anything that came out of a CSV or a database query. It is the default argument to most modelling and plotting functions.

8.7 Factors

Categorical Labels with a Fixed Set of Levels

A factor is R’s built-in type for categorical data. It looks like a character vector, but under the hood it stores small integer codes and a separate table of allowed values (the “levels”). Statistical models, plots, and tables all treat factors differently from plain character vectors, for example, they know the full set of possible values even if some are absent from the current data.

Try here

When to Reach for a Factor

Use a factor when a variable has a small, known set of allowed values (gender, treatment group, grade band, region code). Keep it as a plain character vector when the set of values is open-ended or very large (free-text comments, user ids).

8.8 Arrays

Homogeneous, n-Dimensional

An array is the general case of a matrix. A matrix is a 2-D array; an array can have any number of dimensions. The cells all share one atomic type.

Try here

When to Reach for an Array

Arrays are less common in everyday data analysis than the other structures. They matter when you need genuinely higher-dimensional numeric data: a time-by-region-by-measure cube in economics, a three-way contingency table in statistics, image stacks in bioinformatics.

8.9 A Decision Diagram

Which Structure Should I Use?

If your data is …	Use
A single sequence of values, all the same type.	Vector
A collection of values with mixed types, possibly nested.	List
A rectangle of values, all the same type (numeric grid, image).	Matrix
A rectangle with columns of different types (typical table data).	Data Frame
A small, fixed set of category labels.	Factor
Higher-dimensional numeric data (three or more axes).	Array

Expert Insight: Data Frames Solve 80 Percent of Real Problems

Most of what you will ever do in R happens inside a data frame. It is the common format for CSVs, for database results, for the tidyverse, for ggplot2, and for almost every modelling function. The other structures exist to support data frames internally or to handle the narrower cases where a data frame is the wrong fit.

8.10 A Quick Tour in Code

Seeing All Six Structures Side by Side

Try here

Summary

Concept	Description
Concept
Why Data Structures Matter	Structure shapes how data is stored, accessed, and computed
Two Dimensions of Classification	Homogeneous vs heterogeneous, and 1-D vs 2-D vs n-D
Core Structures
Vectors	Homogeneous, one-dimensional — the default container in R
Lists	Heterogeneous, one-dimensional — the universal container
Matrices	Two-dimensional, single-type rectangular structures
Arrays	Multi-dimensional generalisation of matrices
Data Frames	Two-dimensional, mixed-type tabular data — the workhorse of analysis
Factors	Categorical data with explicit levels