18 Data Frames: Creation, Access, and Manipulation

What this chapter covers

A data frame is the workhorse structure for tabular data in R, rows are observations, columns are variables, and each column may have its own type. This chapter walks through creating data frames from scratch and from files, inspecting their shape and contents, accessing columns and rows in several equivalent ways, modifying values, and filtering subsets. By the end you will be able to take a tabular dataset, ask basic descriptive questions of it, and reshape it for further analysis.

18.1 What is a data frame?

A data frame is a list of equal-length vectors displayed as a table. Internally each column is a vector (so a column has one type), but different columns can hold different types, numeric marks alongside character names alongside logical pass/fail flags.

That mental model, a list of columns, not a matrix of cells, explains almost every quirk of data frame syntax.

Try here

Notice that columns can be a mix of numeric, character and logical, that flexibility is the whole point.

stringsAsFactors

Since R 4.0.0, character columns stay as character by default. In older code you may see data.frame(..., stringsAsFactors = FALSE) to override the historic default of converting strings to factors. Today the argument is harmless but no longer needed.

18.2 Creating data frames

The two everyday entry points are data.frame() for hand-built tables and read.csv() (or its tidyverse cousin readr::read_csv()) for reading files.

Try here

Reading a small CSV from text, useful for examples in a browser without files:

Try here

18.3 Inspecting a data frame

Before touching the values, ask the frame what it is. These five questions answer themselves through five functions:

Question	Function
How many rows / columns?	`nrow()`, `ncol()`, `dim()`
What are the column names?	`names()` or `colnames()`
What is the structure?	`str()`
What does the top look like?	`head()`, `tail()`
What is the descriptive summary?	`summary()`

Try here

str() is the single most useful function, one line per column showing type, length, and a preview.

18.4 Accessing columns

A data frame is a list of columns, so column access uses list syntax. Three equivalent styles:

Try here

All three return the same numeric vector. $ is the most readable for interactive use; [[ is needed when the column name lives in a variable; the matrix form is useful when extracting several columns at once.

Try here

18.5 Accessing rows and cells

Use the [row, column] form, rows before the comma, columns after.

Try here

To filter rows by a condition, build a logical vector and pass it as the row selector:

Try here

Read it as: “give me the rows where marks exceeds 75, all columns.” This is the bread-and-butter pattern of base R analysis.

The trailing comma

df[1, ] (with the trailing comma) returns the first row as a one-row data frame. df[1] (no comma) returns the first column as a one-column data frame, because R falls back to list semantics. Both are valid; you just have to know which one you asked for.

18.6 Adding and modifying columns

Assigning to a new name creates a column; assigning to an existing name overwrites it.

Try here

Several columns at once with cbind():

Try here

To drop a column, set it to NULL:

Try here

18.7 Adding rows

rbind() stacks a new row on the bottom. The names and types must line up.

Try here

For more than a few rows, build the additions as a single data frame and bind once, rbind() in a loop is slow and error-prone.

18.8 Sorting and ordering

Use order() to get a sorted index, then apply it as the row selector.

Try here

order() accepts multiple columns, exactly like a SQL ORDER BY clause.

18.9 Worked example, quarterly sales

A small regional sales table, total each region’s half-year, flag the top performer, and order the results.

Try here

Three lines of column assignment, one row reordering, a complete mini-report on a tabular dataset.

Summary

Concept	Description
Creation and Inspection
What is a Data Frame	A two-dimensional, heterogeneous table where columns can have different types
stringsAsFactors	Pre-R 4.0 default coerced strings to factors; modern default is FALSE
Creating Data Frames	data.frame() builds from named vectors of equal length
Inspecting (head, str, summary)	Quick orientation tools for any new data frame
Access and Manipulation
Column Access with $	df$col returns the column as a vector
Row Access with [	df[rows, cols] subsets rows and columns; trailing comma matters
Adding Columns	Assign to a new name or use $ to extend the frame
Sorting with order()	df[order(df$col), ] sorts rows by a column