flowchart LR
V["An R Value"] --> T["Atomic Type"]
T --> L["logical <br> TRUE / FALSE"]
T --> I["integer <br> 1L, 2L, 3L"]
T --> D["double <br> 1.5, 3.14, 1e6"]
T --> CX["complex <br> 1+2i"]
T --> CH["character <br> 'hello'"]
T --> R["raw <br> byte-level data"]
style V fill:#e3f2fd,stroke:#1976D2
style T fill:#fff3e0,stroke:#F57C00
style L fill:#e8f5e9,stroke:#388E3C
style I fill:#e8f5e9,stroke:#388E3C
style D fill:#e8f5e9,stroke:#388E3C
style CX fill:#f3e5f5,stroke:#8E24AA
style CH fill:#f3e5f5,stroke:#8E24AA
style R fill:#f3e5f5,stroke:#8E24AA
6 Basic Data Types and Type Casting
This chapter introduces the atomic data types that every value in R is built from, and shows how to move values between those types on purpose with explicit casting. You will meet R’s six atomic types (logical, integer, double, complex, character, and raw), the functions that tell you what type a value has (typeof(), class(), is.logical(), is.numeric(), etc.), and the matching family of casting functions (as.logical(), as.integer(), as.numeric(), as.character()). You will also learn how R performs automatic type coercion when values of different types meet, and how to tell the three special values NA, NaN, and NULL apart. By the end of this chapter you will be able to look at any R value and say precisely what it is and what it can safely become.
6.1 The Six Atomic Types
Every value in R is ultimately built from one of six atomic types. Higher-level structures such as vectors, matrices, and data frames are simply collections of atomic values, and every element in a single atomic vector has exactly one type.
| Type | Example | Stores |
|---|---|---|
logical |
TRUE, FALSE, NA |
Boolean truth values. |
integer |
1L, 42L, -7L |
Whole numbers, stored exactly. |
double |
3.14, 1.5, 2e6 |
Real numbers (floating-point). The default numeric type. |
complex |
1+2i |
Complex numbers (real + imaginary). |
character |
"hello", 'R' |
Text strings. |
raw |
as.raw(255) |
Bytes. Rarely used outside low-level file or network work. |
The word numeric in R is a bit confusing. It is an umbrella for both integer and double. is.numeric() returns TRUE for both. When R prints numbers without a decimal point it usually still stores them as double, not integer. To force an integer, add the suffix L, as in 42L.
6.2 Logical Values
Logical values are either TRUE or FALSE, plus a special missing value NA. They are the result of every comparison (x > 3, name == "Rani") and the glue that conditional code is written in.
R also accepts T and F as shortcuts. As noted in Chapter 4, those shortcuts are ordinary variables that can be reassigned, and silently breaking them is a classic production incident. Always write TRUE and FALSE in full.
6.3 Integer vs Double
Any plain number you type into R, such as 5 or 3.14, is stored as a double. To get an integer you must either append the suffix L or use as.integer().
Doubles use roughly 15 significant decimal digits and can represent very large and very small numbers, but they cannot store every integer exactly beyond about 2^53. Integers use 32 bits, are always exact within their range, and use less memory. For most analysis work, doubles are fine; integer types matter when interfacing with C, when memory is tight, or when exactness beyond 15 digits is required.
Doubles cannot represent every decimal number exactly, which sometimes produces surprising comparisons.
6.4 Character Strings
Character values are strings, delimited with single ' or double " quotes. There is no separate “char” vs “string” distinction in R; a single letter and a paragraph are both character vectors of length 1.
R accepts both quote styles, but the Tidyverse style guide and most R code in the wild use double quotes for strings. Reserve single quotes for strings that themselves contain double-quote characters.
6.5 Complex and Raw: The Two You Will Rarely Meet
complex is used in signal processing, physics, and some statistical work; it appears occasionally in fft() and related functions. raw holds bytes and is used when reading binary files, interacting with C code, or working with cryptographic hashes.
6.6 Checking a Value’s Type
typeof(), class(), and is.X() Family
R gives you several ways to ask what a value is. Each answers a slightly different question.
| Tool | Question It Answers |
|---|---|
typeof(x) |
What is the internal storage type? (logical, integer, double, character, …) |
class(x) |
What class (user-facing category) does x belong to? (numeric, Date, data.frame, …) |
is.logical(x), is.integer(x), is.double(x), is.numeric(x), is.character(x) |
Yes/no tests for a specific type. |
typeof() vs class()
typeof() is about how R stores the value in memory. class() is about how R treats the value for dispatching methods. For simple atomic values the two often agree, but for objects such as Date, factor, and data.frame they differ. When in doubt, use class() to reason about behaviour and typeof() to reason about storage.
6.7 Type Coercion (Implicit Conversion)
Atomic vectors in R can hold only one type. When you combine values of different types in a single vector, R silently converts (coerces) everything to the most “general” type in this hierarchy:
logical → integer → double → character
The type furthest to the right wins. Logical becomes integer when mixed with integers; numbers become character when mixed with strings.
A frequent source of bugs is reading a CSV where one column contains a stray string. R coerces the entire column to character, and downstream arithmetic silently fails. Always check str(df) on newly loaded data so you catch type surprises early.
6.8 Type Casting (Explicit Conversion)
as.X() Family
To move a value deliberately from one type to another, use the matching as.X() function.
| Function | Converts To |
|---|---|
as.logical(x) |
logical (non-zero numbers → TRUE, zero → FALSE, "TRUE"/"FALSE" → the matching value) |
as.integer(x) |
integer (truncates toward zero; non-numeric strings → NA) |
as.numeric(x) / as.double(x) |
double |
as.character(x) |
character |
If R cannot convert a value, it returns NA and emits a warning. Always check the result before using it.
Convert inputs to the right type as soon as they enter your script (at the “boundary” where file reads and user input happen), then rely on consistent types everywhere else. This keeps the core of your code simple and pushes type defensiveness to the edges, where it belongs.
6.9 The Three Missing or Absent Values
NA, NaN, and NULL Are Three Different Things
R makes a careful distinction between three values that all mean “nothing is here”, and treating them as synonyms is a common source of bugs.
| Value | Meaning | Tested With |
|---|---|---|
NA |
Missing data. One atomic value whose content is unknown. | is.na(x) |
NaN |
“Not a Number”. The result of an undefined numeric operation like 0/0. |
is.nan(x) |
NULL |
The absence of a value. A zero-length object used to mean “no argument” or “empty slot”. | is.null(x) |
For rare cases where the type of the NA matters (e.g. initialising a vector you plan to fill later), R provides NA_integer_, NA_real_, NA_character_, and NA_complex_. Most of the time the plain NA is enough.
6.10 A Worked Example: Cleaning Mixed Input
The snippet below shows a realistic cleanup: a set of marks entered as strings, some missing or malformed, being cast to numbers and summarised.
This is the pattern you will see repeatedly in later chapters: cast at the boundary, mark failures with NA, and use na.rm = TRUE when summarising.
6.11 Summary
| Concept | Key Takeaway |
|---|---|
| Six atomic types | logical, integer, double, complex, character, raw. |
| Default numeric type | Plain numbers are double; use the L suffix for integers. |
| “Numeric” is an umbrella | is.numeric() is true for both integer and double. |
| Type inspection | typeof() for storage, class() for behaviour, is.X() for yes/no checks. |
| Implicit coercion | logical → integer → double → character; the rightmost type wins. |
| Explicit casting | Use the as.X() family; cast at the input boundary, not scattered through the code. |
| Three absent values | NA is missing data, NaN is undefined arithmetic, NULL is absence of a value. |
| Floating-point caveat | Use all.equal() instead of == for comparing doubles. |
Type discipline is what separates a quick prototype from a script you can trust with someone else’s data. Make str() the first thing you run on every newly loaded data set, prefer explicit casts over implicit coercion, and treat NA, NaN, and NULL as three different ideas. In the next chapter you will meet R’s operators, including the ones that produce many of the logical values you have already seen.