flowchart LR
S["Source Script <br> (your .R or .qmd file)"] --> E["R Parser <br> reads one statement"]
E --> EV["R Evaluator <br> computes the value"]
EV --> B["Binding <br> name -> value in environment"]
B --> O["Output or next statement"]
style S fill:#e3f2fd,stroke:#1976D2
style E fill:#fff3e0,stroke:#F57C00
style EV fill:#fff3e0,stroke:#F57C00
style B fill:#f3e5f5,stroke:#8E24AA
style O fill:#e8f5e9,stroke:#388E3C
4 Basic Syntax, Variables, and Naming Conventions
This chapter introduces the fundamental rules of writing R code. You will learn how R reads and evaluates code line by line, how to write comments and multi-line expressions, how R handles spaces and case, how to create variables using the three assignment operators, and what names you can legally give those variables. You will also see the naming conventions professional R users follow, the reserved words the language will not let you use, and the built-in functions that let you inspect or clear variables in your workspace. By the end of this chapter you will be able to write short R programs that assign, inspect, and reassign values with confidence.
4.1 R Is an Expression-Driven Language
Unlike languages where statements and expressions are different things, almost every piece of R code is an expression that evaluates to a value. When you type 2 + 2 at the console and press Enter, R computes the result and prints it. When you type x <- 5, R also produces a value (the number 5), but it hides the output because assignment is treated as an invisible operation. This design keeps the language small and composable; you can put nearly any piece of code inside any other piece of code.
New users often treat R like a scripting language where they must put everything inside a file. The R console is also a calculator. When you want to check an expression, compute a quick number, or test a function call, type it at the console. That habit shortens the learning loop enormously.
4.2 Statements, Expressions, and Comments
An R statement is a single expression that R can evaluate. You end a statement by pressing Enter or by writing a semicolon ;. You do not need to end every line with a semicolon in R; line breaks are the natural separator. Comments begin with the hash symbol # and continue to the end of the line.
You can place more than one statement on a single line by separating them with semicolons. This is sometimes useful for tight scripts but is usually discouraged because it reduces readability.
Write one statement per line, comment the why (not the what), and let R’s own output do the talking. Future-you reading a script six months from now will thank present-you.
R does not have a native multi-line comment syntax the way C uses /* ... */. To comment out a block of code, prefix every line with #. Most IDEs do this for you with a keyboard shortcut: Ctrl+Shift+C in RStudio.
4.3 Case Sensitivity and Whitespace
Age, age, and AGE are three different names in R. The same applies to function names: mean() works, Mean() does not. Treat capitalisation as meaningful information.
R ignores spaces around operators, inside parentheses, and between tokens. Use spaces to improve readability; do not use them in a way that hides meaning.
| Readable | Legal but Hard to Read |
|---|---|
x <- 5 + 3 |
x<-5+3 |
mean(c(1, 2, 3)) |
mean(c(1,2,3)) |
y <- (a + b) / 2 |
y<-(a+b)/2 |
x<-5 Can Be Parsed as x < -5
Writing the assignment operator without spaces around it is usually safe, but in a few contexts it is genuinely ambiguous. The expression x<-5 reads as x <- 5 (assignment), but x < -5 reads as “is x less than negative five”. Always write a space on both sides of <- to avoid the trap.
4.4 Variables and the Three Assignment Operators
In R, a variable is not a box that holds a value. It is a name that is bound to a value stored somewhere in memory. When you write x <- 10, R creates the number 10 and binds the name x to it. When you reassign x to something else, the old binding is replaced; the old value may be garbage collected.
R provides three operators for assignment. All three work, but they are not identical in where you can use them or how they read.
| Operator | Direction | Typical Use |
|---|---|---|
<- |
right-to-left | The idiomatic choice in scripts and the R community. |
= |
right-to-left | Common inside function calls (for named arguments), sometimes used for assignment. |
-> |
left-to-right | Legal but rare; occasionally useful at the end of a pipeline. |
<- for Assignment
The R community, including the Tidyverse style guide and most textbooks, prefers <- for assignment and reserves = for named arguments inside function calls. This convention makes scripts easier to scan, because the eye can distinguish “this creates a variable” (<-) from “this passes an argument” (=). In RStudio, the keyboard shortcut Alt + - (hyphen) inserts <- with spaces on both sides.
4.5 Reassigning and Updating Variables
Reassignment simply binds the name to a new value; the old value is discarded. The variable can even change type across reassignments because R is dynamically typed.
R letting you rebind score from number to string is convenient in interactive work and treacherous in larger scripts. A common source of bugs is reusing a variable name for something of a different type mid-way through a script. Pick fresh names when the meaning changes.
4.6 Rules for Naming Variables
| Rule | Example of Legal Name | Example of Illegal Name |
|---|---|---|
Must start with a letter or a dot . (not followed by a digit). |
income, .private |
2ndRound, .3rd |
May contain letters, digits, underscore _, and dot .. |
mean_score_2024, price.usd |
mean-score, price usd |
| Cannot contain operators or spaces. | total_cost |
total cost, total+cost |
| Cannot be a reserved word. | result, count |
TRUE, if, function |
R refuses to let you assign a value to any of these reserved words. Using them as variable names produces a syntax error.
| Category | Reserved Words |
|---|---|
| Logical constants | TRUE, FALSE, T, F |
| Missing and special values | NA, NA_integer_, NA_real_, NA_character_, NA_complex_, NULL, NaN, Inf |
| Control flow | if, else, for, while, repeat, break, next, return |
| Declaration | function |
The letters T and F are shortcuts for TRUE and FALSE. Unlike TRUE and FALSE themselves, they are ordinary variables that happen to be pre-assigned. You can overwrite them with T <- 0, and disaster follows for every piece of code that assumed T meant TRUE. Write TRUE and FALSE in full and never reassign T or F.
4.7 Naming Conventions: What the R Community Uses
| Style | Example | Who Uses It |
|---|---|---|
| snake_case | mean_income, customer_id |
Tidyverse, modern R, this book. |
| camelCase | meanIncome, customerId |
Some older R packages, developers from a Java background. |
| dot.case | mean.income, customer.id |
Base R (e.g. data.frame, read.csv), older code. |
| PascalCase | MeanIncome |
Rare in R; more common for function objects in some codebases. |
| UPPER_SNAKE | MAX_SCORE, N_RUNS |
Constants and configuration values. |
Consistency matters more than the style you pick. A script that mixes meanIncome, mean.income, and mean_income is much harder to scan than a script that picks any one style and uses it everywhere. This book and the Tidyverse both use snake_case throughout.
Names like data, df, c, t, mean, and sum already exist as built-in functions or datasets in R. If you reassign them, you will shadow the original and confuse your future self and any collaborator. Use customer_df instead of df, avg_score instead of mean, and so on.
4.8 Inspecting and Managing Your Variables
R provides a handful of built-in functions that let you see, check, and remove the variables in your current session’s environment.
| Function | Purpose |
|---|---|
ls() |
List all objects in the current environment. |
exists("name") |
Return TRUE if a variable with that name exists. |
class(x) |
Report the class of the value bound to x. |
str(x) |
Show the structure and type of x compactly. |
rm(x) |
Remove the binding for x from the environment. |
rm(list = ls()) |
Remove every user-created object. Use with caution. |
A common pattern in old R tutorials is to start every script with rm(list = ls()). That clears your workspace but not the packages that are already attached, and it does not reset random seeds or option settings. A truly clean start comes from restarting R itself (in RStudio: Session → Restart R, or the keyboard shortcut Ctrl+Shift+F10). Restarting R is the modern, reproducible way to start fresh.
4.9 A Small Worked Example
The snippet below applies every idea from this chapter: three assignment operators, readable names, comments that explain the why, a reassignment, and a workspace inspection at the end.
Every one of those three assignment operators works. In production code, the entire block would be written with <- for consistency.
4.10 Summary
| Concept | Key Takeaway |
|---|---|
| Expression-driven | Almost every line in R is an expression that returns a value. |
| Case sensitive | Age, age, and AGE are three different names. |
| Whitespace | Spaces are mostly ignored; use them for readability, and always around <-. |
| Three assignment operators | <-, =, and -> all assign. The R community prefers <-. |
| Naming rules | Start with a letter or dot; use letters, digits, _, and .. Never use a reserved word. |
| Naming conventions | Pick one style (snake_case is idiomatic) and apply it consistently. |
| Avoid shadowing | Do not name variables mean, sum, data, df, c, or t. |
| Workspace tools | ls(), exists(), class(), str(), and rm() manage the current environment. |
| Reproducible starts | Prefer restarting R over rm(list = ls()) for a truly clean session. |
The habits you build in this chapter will repeat themselves thousands of times across every R project. Commit to <- for assignment, snake_case for names, one statement per line, and a fresh R session at the start of every meaningful piece of work. In the next chapter you will start reading input into your programs and writing output back out, using R’s core I/O functions.