Computing on the R language
Metin Yazici
2019-06-15 - 2 years ago
8 min read

Recently, I was keeping myself busy in my free time to develop the supreme package, which can help developers understand the Shiny applications created with modules better.

During that time, I compiled some of notes for myself about how metaprogramming works in R because the package does static analysis on the code. Later, I decided to organize my notes and write a nice a blog post for the general public and also future myself as well.

Next challenge could be to read the infamous SICP book, which awaits me a while in my bookshelf, to understand how R inherited the metaprogramming features from the LISP/Scheme world.


R allows "computing on the language" which means that one can write "some code writes code". Since everything entered as a valid code in R are "expressions", R has great capabilities in metaprogramming1. However, things can get pretty complicated and fragile with metaprogramming. If you chose to use it in your code, be sure that you have a valid reason.

Structure

Expressions in four categories:

  1. Constants e.g. NULL or length-1 atomic vectors 2 e.g. "a" or 1L

  2. Symbols (names) e.g. var in var <- 1. Possible to get it with is.name() or is.symbol() (the latter is better for consistency).

  3. Calls like function calls that are in a special form where the first element is the symbol name. Access it by is.call()

  4. Pairlists They only exist in the function call arguments of functions.

Expressions

  • Expressions are statements forms the R language.

  • Create expressions with expression() or vector("expression").

expression()
is.expression()
as.expression()
  • An expression object is also a list under the hood, therefore it can be subsetted by using the standard indexing operators namely [, [[ and $ The replacement form of these operators, [[<- and $<-, are used to replace or remove elements.

Substitute

substitute() replaces variables with values in the expressions. This can be thought of templating for the expressions.

substitute(x * y, list(x = 2, y = 5))
  • deparse(substitute(x)) is an old trick to get the argument name as a character inside a function.

Quotes

Expressions, the kind you create using the quote() function, come in four flavors: … a primitive value, a name, a function call or a control structure, and a pairlist.

Special question: What’s the difference between between quote() and expression()?

quo <- quote(x <- 2 + 3)
expr <- expression(y <- 5 * 8)
mmy::object_types(quo, expr)

They are pretty the same when you evaluate them with eval(). However, the difference is that expression() wraps the statements as an expression object, therefore returns a vector of unevaluated expressions whereas quote() just returns an unevaluated expression.

as.list(quo)
as.list(expr)

bquote() is just like quote but it allows partial substitution in expressions. Only the expressions wrapped between .() are evaluated.

This example is from the ?bquote help:

a <- 2
bquote(a == a)
quote(a == a)
bquote(a == .(a))

N.B. bquote() is the only form of "quasiquotation" available in base R (Wickham, 2019).

Using bquote can sometimes be more flexible than using substitute(). For example:

n <- 5
substitute(p + x, list(x = n))
bquote(p + .(n))

And this is how enquote() works:

z <- 5
enquote(z == 1)

If you want to return the quote() itself, wrap the quote inside substitute().

substitute(quote(a = 2))

Names and symbols

as.symbol()
is.symbol()

as.name()
is.name()
  • name and symbol mean the same, that refers to the name of the R objects.
e <- expression(fun <- function(x) x)
e[[1]]
# fun <- function(x) x
e[[1]][[1]]
# `<-`
e[[1]][[2]]
# fun
mmy::object_types(e[[1]][[2]])
#       __type__ __value__
# 1        class      name
# 2       typeof    symbol
# 3         mode      name
# 4 storage.mode    symbol
# 5    sexp.type    SYMSXP

Symbols have a "name" mode, "symbol" storage mode and a "symbol" type.

There's a note in the documentation in the ?name:

The term ‘symbol’ is from the LISP background of R, whereas ‘name’ has been the standard S term for this.

I'd prefer to stick to the "symbol" term as it is seems more common among the other programming languages.

Calls

call(name, ...)
is.call(x)
as.call(x)
  • call() returns an unevaluated function call.

call() constructs a call object:

call("convolve")
call("convolve", x = 3, y = 5)
  • In R, you can "call" the expressions by wrapping them between the parentheses because ( is the operator for calling (see ?Paren).
(cconv <- call("convolve", x = 3, y = 5))
as.list(cconv)
eval(cconv)

N.B. do.call() calls a function by a name on a given argument list.

N.B. There’s a bunch of functions to access and manipulate the call stack. See ?sys.parent documentation for more information.

Function

square <- function(x) {
    x ^ 2
}
  • Functions (or closures) have three components:

    • Formals (via formals(square))

      Formal argument list can be can be a symbol or special dot-dot-dot (...) type

    • Body (via body(square))

    • Environment (via environment(square))

Language

R language consists of three types of objects:

  1. Calls (call())

  2. Expressions (expression())

  3. Symbols or names (as.symbol or as.name)

e <- expression(x <- 1)
is.language(e)
# [1] TRUE
mmy::object_types(e)
#       __type__  __value__
# 1        class expression
# 2       typeof expression
# 3         mode expression
# 4 storage.mode expression
# 5    sexp.type    EXPRSXP
e[[1]][[1]]
# `<-`
mmy::object_types(e[[1]][[1]])
#       __type__ __value__
# 1        class      name
# 2       typeof    symbol
# 3         mode      name
# 4 storage.mode    symbol
# 5    sexp.type    SYMSXP

Note that objects returned by quote are “not” considered as the language:

is.language(quote(1))
## [1] FALSE

Eval

As the name reveals, eval() evaluates expressions. The first argument of the eval() can be an expression, and the second one can be a list containing parameters to be passed onto the function call.

eval(body(square), list(x = 4))

Evaluate calls:

eval(call("square", 2))

Parsing

Parse tree

e <- expression({
  x <- 2L
  y <- 5L
  convolve(x, y)
})
l <- as.list(e)

The call as.list() is very convenient to have the components of expressions listed as a parsed tree.

ewh <- quote({
  while (x < 5) {
    rnorm(x)
    mean(x)
    median(x)
    x <- x + 1
  }
})
as.list(ewh)

Here are the functions that can help you construct trees:

pryr::call_tree(e)

mmy::expr_tree(e)

N.B. The environment object is lost when an expression is constructed into a list (by as.list()). That's why, it is advised not to use lists to create functions e.g. with as.function().

parse and deparse

The names are already obvious. For example:

deparse(quote(1 + 3))
parse(text = "1 + 3")

getParseData

  • utils::getParseData() can be used to parse the R code at a low level.
e <- expression({
  x <- 10
  y <- "text"
  z <<- 2
  ## some comment here..
  lapply(mtcars, function(i) {
    pnorm(mtcars[i, i], log.p = TRUE)
  }) -> res
  paste(y, res, sep = ":")
})
prs <- parse(text = e)
parsed <- getParseData(prs)
head(parsed)
#     line1 col1 line2 col2  id parent       token terminal text
# 127     1    1     9    1 127      0        expr    FALSE
# 1       1    1     1    1   1    127         '{'     TRUE    {
# 9       2    5     2   11   9    127        expr    FALSE
# 3       2    5     2    5   3      5      SYMBOL     TRUE    x
# 5       2    5     2    5   5      9        expr    FALSE
# 4       2    7     2    8   4      9 LEFT_ASSIGN     TRUE   <-

| Token | Example use | Notes | |:----------|:----------------|:----------| | COMMENT | # | | | LEFT_ASSIGN | <-, <<- | right assign -> turned into left assign | | SYMBOL | mtcars, x, ... | | | FUNCTION | function | | | SYMBOL_FORMALS | i | | | SYMBOL_FUNCTION_CALL | lapply, pnorm, ... | | | SYMBOL_SUB | log.p | specified arg. names in function calls | | EQ_ASSIGN | = | (equality assignment e.g. x = 2) | | EQ SUB | = | (in log.p = TRUE) | | STR_CONST | "text" | | | NUM_CONST | 10 | |

There are also some tokens such as '{', '(' and ','.

Notes

Right assign operator -> is turned into the commonly used left assign operator <- when R parsing expressions.

expression(lapply(mtcars, mean) -> res)

Resources

  • Chambers, J. (2008). Software for data analysis: programming with R. Springer Science & Business Media.

  • Wickham, H. (2019). Advanced R (second edition). CRC Press.

  • The R language definition documents the language per se.

  • T. Mailund. (2017). Metaprogramming in R. DOI 10.1007/978-1-4842-2881-4_1

  • Kalibera, T., Maj, P., Morandat, F., & Vitek, J. (2014, March). A fast abstract syntax tree interpreter for R. In ACM SIGPLAN Notices (Vol. 49, No. 7, pp. 89-102). ACM.


  1. The most famous DSL in R could be ggplot2.
  2. R does not have scalar values per se.