Recently, I was keeping myself busy in my free time to develop the supreme package, which can help developers understand the Shiny applications created with modules better.
During that time, I compiled some of notes for myself about how metaprogramming works in R because the package does static analysis on the code. Later, I decided to organize my notes and write a nice a blog post for the general public and also future myself as well.
Next challenge could be to read the infamous SICP book, which awaits me a while in my bookshelf, to understand how R inherited the metaprogramming features from the LISP/Scheme world.
R allows "computing on the language" which means that one can write "some code writes code". Since everything entered as a valid code in R are "expressions", R has great capabilities in metaprogramming1. However, things can get pretty complicated and fragile with metaprogramming. If you chose to use it in your code, be sure that you have a valid reason.
Structure
Expressions in four categories:
-
Constants e.g.
NULL
or length-1 atomic vectors 2 e.g."a"
or1L
-
Symbols (names) e.g.
var
invar <- 1
. Possible to get it withis.name()
oris.symbol()
(the latter is better for consistency). -
Calls like function calls that are in a special form where the first element is the symbol name. Access it by
is.call()
-
Pairlists They only exist in the function call arguments of functions.
Expressions
-
Expressions are statements forms the R language.
-
Create expressions with
expression()
orvector("expression")
.
expression()
is.expression()
as.expression()
- An expression object is also a list under the hood, therefore it can be
subsetted by using the standard indexing operators namely
[
,[[
and$
The replacement form of these operators,[[<-
and$<-
, are used to replace or remove elements.
Substitute
substitute()
replaces variables with values in the expressions. This can be
thought of templating for the expressions.
substitute(x * y, list(x = 2, y = 5))
deparse(substitute(x))
is an old trick to get the argument name as a character inside a function.
Quotes
Expressions, the kind you create using the
quote()
function, come in four flavors: … a primitive value, a name, a function call or a control structure, and a pairlist.
Special question: What’s the difference between between quote()
and
expression()
?
quo <- quote(x <- 2 + 3)
expr <- expression(y <- 5 * 8)
mmy::object_types(quo, expr)
They are pretty the same when you evaluate them with eval()
.
However, the difference is that expression()
wraps the statements as
an expression object, therefore returns a vector of unevaluated
expressions whereas quote()
just returns an unevaluated expression.
as.list(quo)
as.list(expr)
bquote()
is just like quote but it allows partial substitution in
expressions. Only the expressions wrapped between .()
are evaluated.
This example is from the ?bquote
help:
a <- 2
bquote(a == a)
quote(a == a)
bquote(a == .(a))
N.B. bquote()
is the only form of "quasiquotation" available in
base R (Wickham, 2019).
Using bquote
can sometimes be more flexible than using substitute()
. For
example:
n <- 5
substitute(p + x, list(x = n))
bquote(p + .(n))
And this is how enquote()
works:
z <- 5
enquote(z == 1)
If you want to return the quote()
itself, wrap the quote inside substitute()
.
substitute(quote(a = 2))
Names and symbols
as.symbol()
is.symbol()
as.name()
is.name()
name
andsymbol
mean the same, that refers to the name of the R objects.
e <- expression(fun <- function(x) x)
e[[1]]
# fun <- function(x) x
e[[1]][[1]]
# `<-`
e[[1]][[2]]
# fun
mmy::object_types(e[[1]][[2]])
# __type__ __value__
# 1 class name
# 2 typeof symbol
# 3 mode name
# 4 storage.mode symbol
# 5 sexp.type SYMSXP
Symbols have a "name" mode, "symbol" storage mode and a "symbol" type.
There's a note in the documentation in the ?name
:
The term ‘symbol’ is from the LISP background of R, whereas ‘name’ has been the standard S term for this.
I'd prefer to stick to the "symbol" term as it is seems more common among the other programming languages.
Calls
call(name, ...)
is.call(x)
as.call(x)
call()
returns an unevaluated function call.
call()
constructs a call object:
call("convolve")
call("convolve", x = 3, y = 5)
- In R, you can "call" the expressions by wrapping them between the
parentheses because
(
is the operator for calling (see?Paren
).
(cconv <- call("convolve", x = 3, y = 5))
as.list(cconv)
eval(cconv)
N.B. do.call()
calls a function by a name on a given argument list.
N.B. There’s a bunch of functions to access and manipulate the call
stack. See ?sys.parent
documentation for more information.
Function
square <- function(x) {
x ^ 2
}
-
Functions (or closures) have three components:
-
Formals (via
formals(square)
)Formal argument list can be can be a symbol or special dot-dot-dot (
...
) type -
Body (via
body(square)
) -
Environment (via
environment(square)
)
-
Language
R language consists of three types of objects:
-
Calls (
call()
) -
Expressions (
expression()
) -
Symbols or names (
as.symbol
oras.name
)
e <- expression(x <- 1)
is.language(e)
# [1] TRUE
mmy::object_types(e)
# __type__ __value__
# 1 class expression
# 2 typeof expression
# 3 mode expression
# 4 storage.mode expression
# 5 sexp.type EXPRSXP
e[[1]][[1]]
# `<-`
mmy::object_types(e[[1]][[1]])
# __type__ __value__
# 1 class name
# 2 typeof symbol
# 3 mode name
# 4 storage.mode symbol
# 5 sexp.type SYMSXP
Note that objects returned by quote
are “not” considered as the
language:
is.language(quote(1))
## [1] FALSE
Eval
As the name reveals, eval()
evaluates expressions. The first argument of the
eval()
can be an expression, and the second one can be a list containing
parameters to be passed onto the function call.
eval(body(square), list(x = 4))
Evaluate calls:
eval(call("square", 2))
Parsing
Parse tree
e <- expression({
x <- 2L
y <- 5L
convolve(x, y)
})
l <- as.list(e)
The call as.list()
is very convenient to have the components of
expressions listed as a parsed tree.
ewh <- quote({
while (x < 5) {
rnorm(x)
mean(x)
median(x)
x <- x + 1
}
})
as.list(ewh)
Here are the functions that can help you construct trees:
pryr::call_tree(e)
mmy::expr_tree(e)
N.B. The environment object is lost when an expression is constructed into a
list (by as.list()
). That's why, it is advised not to use lists to create
functions e.g. with as.function()
.
parse and deparse
The names are already obvious. For example:
deparse(quote(1 + 3))
parse(text = "1 + 3")
getParseData
utils::getParseData()
can be used to parse the R code at a low level.
e <- expression({
x <- 10
y <- "text"
z <<- 2
## some comment here..
lapply(mtcars, function(i) {
pnorm(mtcars[i, i], log.p = TRUE)
}) -> res
paste(y, res, sep = ":")
})
prs <- parse(text = e)
parsed <- getParseData(prs)
head(parsed)
# line1 col1 line2 col2 id parent token terminal text
# 127 1 1 9 1 127 0 expr FALSE
# 1 1 1 1 1 1 127 '{' TRUE {
# 9 2 5 2 11 9 127 expr FALSE
# 3 2 5 2 5 3 5 SYMBOL TRUE x
# 5 2 5 2 5 5 9 expr FALSE
# 4 2 7 2 8 4 9 LEFT_ASSIGN TRUE <-
| Token | Example use | Notes |
|:----------|:----------------|:----------|
| COMMENT
| #
| |
| LEFT_ASSIGN
| <-
, <<-
| right assign ->
turned into left assign |
| SYMBOL
| mtcars
, x
, ... | |
| FUNCTION
| function
| |
| SYMBOL_FORMALS
| i
| |
| SYMBOL_FUNCTION_CALL
| lapply
, pnorm
, ... | |
| SYMBOL_SUB
| log.p
| specified arg. names in function calls |
| EQ_ASSIGN
| =
| (equality assignment e.g. x = 2
) |
| EQ SUB
| =
| (in log.p = TRUE
) |
| STR_CONST
| "text"
| |
| NUM_CONST
| 10
| |
There are also some tokens such as '{'
, '('
and ','
.
Notes
Right assign operator ->
is turned into the commonly used left assign
operator <-
when R parsing expressions.
expression(lapply(mtcars, mean) -> res)
Resources
-
Chambers, J. (2008). Software for data analysis: programming with R. Springer Science & Business Media.
-
Wickham, H. (2019). Advanced R (second edition). CRC Press.
-
The R language definition documents the language per se.
-
T. Mailund. (2017). Metaprogramming in R. DOI 10.1007/978-1-4842-2881-4_1
-
Kalibera, T., Maj, P., Morandat, F., & Vitek, J. (2014, March). A fast abstract syntax tree interpreter for R. In ACM SIGPLAN Notices (Vol. 49, No. 7, pp. 89-102). ACM.