25  Functions: Purpose, Types, and Creation

NoteWhat this chapter covers

A function wraps a block of logic behind a name and a clear interface: what goes in (arguments), what comes out (return value). Functions let you name an idea, reuse it, test it, and hide the details. This chapter covers why functions matter, how to write them, the difference between positional, named, and default arguments, variable-length arguments with ..., and the rules R uses to find variables inside a function (scoping). You’ll learn to turn a rough snippet of analysis code into a clean, reusable function.

25.1 Why functions?

Take any script longer than a page and you’ll find the same three or four lines of logic appearing in slightly different form over and over. A function is how you name that logic once and use the name afterwards.

Functions buy you four things:

  1. Names. pct(82, 100) tells the reader what’s happening; round(82/100*100, 1) makes them decode it.
  2. Reuse. One edit updates every caller.
  3. Isolation. Variables inside a function don’t leak out, you can experiment without polluting the workspace.
  4. Testing. A function with clear inputs and outputs is something you can hand examples to and verify.

25.2 The anatomy of a function

Every R function has the same skeleton:

name <- function(arg1, arg2, ...) {
  body                     # one or more expressions
  return_value             # the last expression is the result
}

A live example:

Three things to notice:

  1. function(…) { … } is itself an expression. We assign it to a name with <-.
  2. The braces {} wrap the body. You can omit them for a one-line body, but including them is never wrong.
  3. The last expression is the return value, no return() keyword required.

25.3 return(), explicit vs implicit

Both forms below work identically:

R convention is to rely on the implicit return for the normal exit, and use return() only for early returns, jumping out before the end when a short-circuit condition fires.

25.4 Arguments, positional, named, default

Arguments can be passed by position or by name. Named passing is more explicit and safer once a function has more than two or three arguments.

Default values let you omit arguments that usually take the same value:

Defaults can reference earlier arguments, handy for computed defaults:

TipArgument-order discipline

A healthy convention: put the data first, then required parameters, then optional parameters with defaults. This matches R’s own functions (mean(x, na.rm = FALSE)) and plays well with the pipe |>.

25.5 Variable arguments, ...

... lets a function accept an unknown number of extra arguments. It collects them and passes them through to another function unchanged.

Inside the function, ... can be inspected by wrapping it in list(...):

... is how most plotting and summary functions let you pass through graphical or statistical options you didn’t anticipate.

25.6 Return multiple values, return a list

R functions return exactly one object. To return several things, bundle them in a named list.

The caller pulls values out with $ or [[. This idiom is everywhere, every model-fitting function in R returns a list of this shape.

25.7 Scoping, where does a name come from?

Inside a function, R looks for variables in a specific order: local first, then the enclosing environment, then parent environments, then the global workspace.

Variables created inside a function are local, they disappear when the function returns:

WarningAvoid reaching out for inputs

The example above works but is fragile. If multiplier changes, the function’s behaviour changes silently. Always prefer passing values in as arguments, functions should read their inputs from arguments, not from the surrounding workspace.

25.8 Functions are first-class

A function is an ordinary R object. You can store it in a variable, pass it to another function, return it from a function, or put it in a list.

This is the property that makes the apply family (Chapter 24) possible, sapply(x, mean) passes the function mean as a value.

25.9 Types of functions you’ll meet

Four categories worth naming, even though there’s nothing syntactically different between them:

  1. Built-in functions, ship with R: mean(), sum(), paste(), lm().
  2. Package functions, loaded via library(): dplyr::mutate(), stringr::str_detect().
  3. User-defined functions, the kind we’re writing in this chapter.
  4. Anonymous functions, defined on the spot without a name: \(x) x^2 used inside an apply call.

The rules are identical for all four. The distinction is social, not technical.

25.10 Pure vs side-effect functions

A pure function returns a value and does nothing else, no printing, no plotting, no writing to files. A side-effect function changes the outside world.

Rule of thumb: pure functions are easier to test and combine. Reserve printing, messaging, and file I/O for functions whose job is precisely that.

25.11 Worked example, a reusable grading function

Package the grading logic from Chapter 21 as a proper function: explicit arguments, default rules, a clean return value, and support for a vector input via ifelse.

Look at what the function gained:

  • A default cutoff vector matches the common case; callers override for a custom scheme.
  • The function accepts scalar or vector input, one implementation, two use cases.
  • The behaviour is documented through the parameter names, not a paragraph of comments.

25.12 Summary

Concept Syntax
Define a function name <- function(args) { body }
Implicit return last expression in the body
Explicit early return return(value)
Named argument fn(arg_name = value)
Default value function(x, y = 1)
Variable arguments function(x, ...) { ... }
Return multiple values list(a = …, b = …)
Scope lookup local → enclosing → parent → globals

Three habits separate working code from re-usable code:

  1. One function, one job. If you can’t name the function in five words, split it.
  2. Inputs via arguments, not the workspace. Pure arguments make the function portable.
  3. Defaults for the common case, arguments for the flexible bits. Callers should pay complexity only for the flexibility they actually need.

With the basics in place, the next chapter tackles three specialised forms, recursive functions that call themselves, closures that carry state, and nested functions that help organise larger logic.