20 Factors: Creation, Levels, and Reordering

What this chapter covers

A factor is R’s data type for categorical variables, values drawn from a fixed, known set of categories such as "Low" / "Medium" / "High" or "Pass" / "Fail". Internally a factor stores integers with a label table called levels, which makes it both memory-efficient and aware of order. This chapter covers creating factors, inspecting and renaming levels, ordered factors, and changing the level order so summaries and plots come out the way you want.

20.1 Why a separate type for categories?

A column of grades stored as text is just letters with no inherent order, "A", "B", "C" sort alphabetically by accident. A factor lets you state explicitly: “these are the only valid categories, and this is their order.” That information then drives:

which categories summary() and table() count (including those with zero observations),
the order categories appear in plots and group-by output,
modelling code that needs categorical predictors with a known reference level.

Try here

Notice the printout: the values appear without quotes and a Levels: line shows the category list.

20.2 Creating factors

factor() accepts a vector and infers levels by sorting the unique values alphabetically.

Try here

That alphabetical default is rarely what you want. Specify levels = to take control.

Try here

Pass labels = to rename categories at the same time:

Try here

Values not in levels become NA

If a value in your data isn’t listed in levels, R silently turns it into NA. Use unique() first to be sure your level list covers everything.

Try here

The XL becomes <NA>, a useful safety net when you want to flag unknown categories, but a trap if you didn’t intend it.

20.3 Inspecting factors

Try here

as.integer() exposes the factor’s secret: each value is really an integer pointing into the levels vector. That is why factors are so cheap and so fast for grouping operations.

20.4 Adding, renaming, and dropping levels

Renaming all levels at once:

Try here

Adding a level that may not appear in the data yet, useful for plots that should always reserve space for an empty category:

Try here

Dropping unused levels after a filter:

Try here

droplevels() is the cleanup function to know.

20.5 Ordered factors

Some categorical variables have a natural order, Low < Medium < High, or grades D < C < B < A. ordered = TRUE records that order so comparisons work.

Try here

Use ordered factors for measurement scales and severity grades, but stick with regular factors for categories that have no inherent order (region, department, colour). Ordered factors change how some modelling functions treat the variable, so don’t reach for them by default.

20.6 Reordering levels

The level order, not the alphabetical order of the labels, drives every summary and plot. Three common ways to change it.

By hand with factor(..., levels = ...), re-state the level vector explicitly:

Try here

Move one level to the front with relevel(), handy for setting a regression reference category:

Try here

By another variable’s value with forcats::fct_reorder(), the easiest way to make a bar chart sort by height. forcats is part of the tidyverse and runs in webr.

Try here

Two more forcats helpers worth memorising:

fct_infreq(x), order levels by how often each appears (most common first).
fct_relevel(x, "Foo", "Bar"), push named levels to the front in the order given.

Try here

20.7 Worked example, student performance

A small dataset of student grades. We want a frequency table with categories in pedagogical order (F < D < C < B < A), not alphabetical, plus the top performer.

Try here

Two things to notice. First, cut() is the workhorse for turning a numeric variable into a factor with custom bins. Second, because grade is an ordered factor, max() and == give meaningful answers, exactly what factor() was designed for.

Summary

Concept	Description
Concept and Creation
Why Factors	Categorical data with a fixed set of values modelled explicitly
Creating Factors	factor(x, levels = ...) creates a factor with optional ordering
Levels and Out-of-Levels	Values not in levels become NA — guard against silent loss
Inspecting and Reordering
Inspecting Factors	levels(), nlevels(), table() reveal structure and counts
Adding and Renaming Levels	levels() <- ... renames; new categories require re-creation
Dropping Unused Levels	droplevels() removes unused levels after subsetting
Ordered Factors	ordered = TRUE adds order; comparisons such as < and > work
Reordering Levels	factor(x, levels = c(...)) or forcats::fct_relevel() reorders