6  Basic Data Types and Type Casting

NoteWhat This Chapter Covers

This chapter introduces the atomic data types that every value in R is built from, and shows how to move values between those types on purpose with explicit casting. You will meet R’s six atomic types (logical, integer, double, complex, character, and raw), the functions that tell you what type a value has (typeof(), class(), is.logical(), is.numeric(), etc.), and the matching family of casting functions (as.logical(), as.integer(), as.numeric(), as.character()). You will also learn how R performs automatic type coercion when values of different types meet, and how to tell the three special values NA, NaN, and NULL apart. By the end of this chapter you will be able to look at any R value and say precisely what it is and what it can safely become.

flowchart LR
    V["An R Value"] --> T["Atomic Type"]
    T --> L["logical <br> TRUE / FALSE"]
    T --> I["integer <br> 1L, 2L, 3L"]
    T --> D["double <br> 1.5, 3.14, 1e6"]
    T --> CX["complex <br> 1+2i"]
    T --> CH["character <br> 'hello'"]
    T --> R["raw <br> byte-level data"]
    style V fill:#e3f2fd,stroke:#1976D2
    style T fill:#fff3e0,stroke:#F57C00
    style L fill:#e8f5e9,stroke:#388E3C
    style I fill:#e8f5e9,stroke:#388E3C
    style D fill:#e8f5e9,stroke:#388E3C
    style CX fill:#f3e5f5,stroke:#8E24AA
    style CH fill:#f3e5f5,stroke:#8E24AA
    style R fill:#f3e5f5,stroke:#8E24AA


6.1 The Six Atomic Types

NoteCore Concept: Every R Value Starts as an Atomic Type

Every value in R is ultimately built from one of six atomic types. Higher-level structures such as vectors, matrices, and data frames are simply collections of atomic values, and every element in a single atomic vector has exactly one type.

Type Example Stores
logical TRUE, FALSE, NA Boolean truth values.
integer 1L, 42L, -7L Whole numbers, stored exactly.
double 3.14, 1.5, 2e6 Real numbers (floating-point). The default numeric type.
complex 1+2i Complex numbers (real + imaginary).
character "hello", 'R' Text strings.
raw as.raw(255) Bytes. Rarely used outside low-level file or network work.
TipExpert Insight: “Numeric” Is an Umbrella Term

The word numeric in R is a bit confusing. It is an umbrella for both integer and double. is.numeric() returns TRUE for both. When R prints numbers without a decimal point it usually still stores them as double, not integer. To force an integer, add the suffix L, as in 42L.


6.2 Logical Values

NoteThe Smallest Type: Truth Values

Logical values are either TRUE or FALSE, plus a special missing value NA. They are the result of every comparison (x > 3, name == "Rani") and the glue that conditional code is written in.

WarningWrite TRUE and FALSE in Full

R also accepts T and F as shortcuts. As noted in Chapter 4, those shortcuts are ordinary variables that can be reassigned, and silently breaking them is a classic production incident. Always write TRUE and FALSE in full.


6.3 Integer vs Double

NoteThe Default Is Double

Any plain number you type into R, such as 5 or 3.14, is stored as a double. To get an integer you must either append the suffix L or use as.integer().

NoteWhy Two Numeric Types Exist

Doubles use roughly 15 significant decimal digits and can represent very large and very small numbers, but they cannot store every integer exactly beyond about 2^53. Integers use 32 bits, are always exact within their range, and use less memory. For most analysis work, doubles are fine; integer types matter when interfacing with C, when memory is tight, or when exactness beyond 15 digits is required.

WarningCommon Mistake: Floating-Point Surprises

Doubles cannot represent every decimal number exactly, which sometimes produces surprising comparisons.


6.4 Character Strings

NoteText Lives in Character Vectors

Character values are strings, delimited with single ' or double " quotes. There is no separate “char” vs “string” distinction in R; a single letter and a paragraph are both character vectors of length 1.

TipBest Practice: Prefer Double Quotes

R accepts both quote styles, but the Tidyverse style guide and most R code in the wild use double quotes for strings. Reserve single quotes for strings that themselves contain double-quote characters.


6.5 Complex and Raw: The Two You Will Rarely Meet

NoteWhen They Matter

complex is used in signal processing, physics, and some statistical work; it appears occasionally in fft() and related functions. raw holds bytes and is used when reading binary files, interacting with C code, or working with cryptographic hashes.


6.6 Checking a Value’s Type

NoteThe typeof(), class(), and is.X() Family

R gives you several ways to ask what a value is. Each answers a slightly different question.

Tool Question It Answers
typeof(x) What is the internal storage type? (logical, integer, double, character, …)
class(x) What class (user-facing category) does x belong to? (numeric, Date, data.frame, …)
is.logical(x), is.integer(x), is.double(x), is.numeric(x), is.character(x) Yes/no tests for a specific type.
TipExpert Insight: typeof() vs class()

typeof() is about how R stores the value in memory. class() is about how R treats the value for dispatching methods. For simple atomic values the two often agree, but for objects such as Date, factor, and data.frame they differ. When in doubt, use class() to reason about behaviour and typeof() to reason about storage.


6.7 Type Coercion (Implicit Conversion)

NoteCore Concept: Mixing Types Triggers Coercion

Atomic vectors in R can hold only one type. When you combine values of different types in a single vector, R silently converts (coerces) everything to the most “general” type in this hierarchy:

logicalintegerdoublecharacter

The type furthest to the right wins. Logical becomes integer when mixed with integers; numbers become character when mixed with strings.

WarningCommon Mistake: “My Numbers Are Not Numbers”

A frequent source of bugs is reading a CSV where one column contains a stray string. R coerces the entire column to character, and downstream arithmetic silently fails. Always check str(df) on newly loaded data so you catch type surprises early.


6.8 Type Casting (Explicit Conversion)

NoteThe as.X() Family

To move a value deliberately from one type to another, use the matching as.X() function.

Function Converts To
as.logical(x) logical (non-zero numbers → TRUE, zero → FALSE, "TRUE"/"FALSE" → the matching value)
as.integer(x) integer (truncates toward zero; non-numeric strings → NA)
as.numeric(x) / as.double(x) double
as.character(x) character
NoteWhen Casting Fails

If R cannot convert a value, it returns NA and emits a warning. Always check the result before using it.

TipBest Practice: Cast at the Boundary, Not in the Middle

Convert inputs to the right type as soon as they enter your script (at the “boundary” where file reads and user input happen), then rely on consistent types everywhere else. This keeps the core of your code simple and pushes type defensiveness to the edges, where it belongs.


6.9 The Three Missing or Absent Values

NoteNA, NaN, and NULL Are Three Different Things

R makes a careful distinction between three values that all mean “nothing is here”, and treating them as synonyms is a common source of bugs.

Value Meaning Tested With
NA Missing data. One atomic value whose content is unknown. is.na(x)
NaN “Not a Number”. The result of an undefined numeric operation like 0/0. is.nan(x)
NULL The absence of a value. A zero-length object used to mean “no argument” or “empty slot”. is.null(x)
TipExpert Insight: Typed NA Values Exist Too

For rare cases where the type of the NA matters (e.g. initialising a vector you plan to fill later), R provides NA_integer_, NA_real_, NA_character_, and NA_complex_. Most of the time the plain NA is enough.


6.10 A Worked Example: Cleaning Mixed Input

NotePutting It Together

The snippet below shows a realistic cleanup: a set of marks entered as strings, some missing or malformed, being cast to numbers and summarised.

This is the pattern you will see repeatedly in later chapters: cast at the boundary, mark failures with NA, and use na.rm = TRUE when summarising.


6.11 Summary

NoteKey Concepts at a Glance
Concept Key Takeaway
Six atomic types logical, integer, double, complex, character, raw.
Default numeric type Plain numbers are double; use the L suffix for integers.
“Numeric” is an umbrella is.numeric() is true for both integer and double.
Type inspection typeof() for storage, class() for behaviour, is.X() for yes/no checks.
Implicit coercion logicalintegerdoublecharacter; the rightmost type wins.
Explicit casting Use the as.X() family; cast at the input boundary, not scattered through the code.
Three absent values NA is missing data, NaN is undefined arithmetic, NULL is absence of a value.
Floating-point caveat Use all.equal() instead of == for comparing doubles.
TipApplying This in Practice

Type discipline is what separates a quick prototype from a script you can trust with someone else’s data. Make str() the first thing you run on every newly loaded data set, prefer explicit casts over implicit coercion, and treat NA, NaN, and NULL as three different ideas. In the next chapter you will meet R’s operators, including the ones that produce many of the logical values you have already seen.