18 Data Frames: Creation, Access, and Manipulation
A data frame is the workhorse structure for tabular data in R, rows are observations, columns are variables, and each column may have its own type. This chapter walks through creating data frames from scratch and from files, inspecting their shape and contents, accessing columns and rows in several equivalent ways, modifying values, and filtering subsets. By the end you will be able to take a tabular dataset, ask basic descriptive questions of it, and reshape it for further analysis.
18.1 What is a data frame?
A data frame is a list of equal-length vectors displayed as a table. Internally each column is a vector (so a column has one type), but different columns can hold different types, numeric marks alongside character names alongside logical pass/fail flags.
That mental model, a list of columns, not a matrix of cells, explains almost every quirk of data frame syntax.
Notice that columns can be a mix of numeric, character and logical, that flexibility is the whole point.
stringsAsFactors
Since R 4.0.0, character columns stay as character by default. In older code you may see data.frame(..., stringsAsFactors = FALSE) to override the historic default of converting strings to factors. Today the argument is harmless but no longer needed.
18.2 Creating data frames
The two everyday entry points are data.frame() for hand-built tables and read.csv() (or its tidyverse cousin readr::read_csv()) for reading files.
Reading a small CSV from text, useful for examples in a browser without files:
18.3 Inspecting a data frame
Before touching the values, ask the frame what it is. These five questions answer themselves through five functions:
| Question | Function |
|---|---|
| How many rows / columns? | nrow(), ncol(), dim() |
| What are the column names? | names() or colnames() |
| What is the structure? | str() |
| What does the top look like? | head(), tail() |
| What is the descriptive summary? | summary() |
str() is the single most useful function, one line per column showing type, length, and a preview.
18.4 Accessing columns
A data frame is a list of columns, so column access uses list syntax. Three equivalent styles:
All three return the same numeric vector. $ is the most readable for interactive use; [[ is needed when the column name lives in a variable; the matrix form is useful when extracting several columns at once.
18.5 Accessing rows and cells
Use the [row, column] form, rows before the comma, columns after.
To filter rows by a condition, build a logical vector and pass it as the row selector:
Read it as: “give me the rows where marks exceeds 75, all columns.” This is the bread-and-butter pattern of base R analysis.
df[1, ] (with the trailing comma) returns the first row as a one-row data frame. df[1] (no comma) returns the first column as a one-column data frame, because R falls back to list semantics. Both are valid; you just have to know which one you asked for.
18.6 Adding and modifying columns
Assigning to a new name creates a column; assigning to an existing name overwrites it.
Several columns at once with cbind():
To drop a column, set it to NULL:
18.7 Adding rows
rbind() stacks a new row on the bottom. The names and types must line up.
For more than a few rows, build the additions as a single data frame and bind once, rbind() in a loop is slow and error-prone.
18.8 Sorting and ordering
Use order() to get a sorted index, then apply it as the row selector.
order() accepts multiple columns, exactly like a SQL ORDER BY clause.
18.9 Worked example, quarterly sales
A small regional sales table, total each region’s half-year, flag the top performer, and order the results.
Three lines of column assignment, one row reordering, a complete mini-report on a tabular dataset.
18.10 Summary
| Task | Function / syntax |
|---|---|
| Create a frame | data.frame(), read.csv() |
| Shape & types | dim(), nrow(), ncol(), str() |
| Preview | head(), tail(), summary() |
| Access a column | df$col, df[["col"]], df[, "col"] |
| Filter rows | df[df$col > x, ] |
| Add / drop column | df$new <- …, df$col <- NULL |
| Add row | rbind() |
| Sort | df[order(df$col), ] |
Data frames are the bridge between raw R vectors and the analytical world. Once you can confidently create, slice, and modify them in base R, the next chapter, dplyr, gives you a more concise grammar for the same ideas plus a lot more.