The content explained here is compiled from different resources in which they are referenced where applicable.
This content is mainly about using native C
code in R
. The content
of C++
and Rcpp
1 is also discussed
and added here as resources; however, they are not in the primary focus. Other
languages such as Fortran, Java, Rust etc. are not included to this post
at all.
General resources and strategies to find new resources
-
In the very beginning, it is wise to read Hadley’s Documentation for R’s internal C API. The documentation doesn’t cover every detail but it’s really a great start.
-
The infamous Writing R Extensions manual is quite extensive. Consult there at every occasion after being more confident about native code in R.
-
Read the R source code. wch/r-source is the read-only mirror of R source code from https://svn.r-project.org/R/, updated hourly. jimhester/lookup provides an easy way to lookup the source code by providing call names. Works great also on
.Call
,.C
,.Internal
and.External
. -
Sign up for R mailing lists (e.g. R-devel) or follow the discussions through nabble or The mail archive.
Header paths
- For R internals (residing in
R.h
andRinternals.h
), you can run the code below to find the paths:
Rscript -e 'file.path(R.home(), "include")'
which returns /Library/Frameworks/R.framework/Resources/include
in OS
X and /usr/lib/R
in Debian/Ubuntu.
- For specific package headers such as Rcpp, use the call:
pkg_headers <- function(package) {
ip <- installed.packages()
file.path(ip[match(package, ip), "LibPath"], x, "include")
}
pkg_headers("Rcpp")
returning ~/R/x86\_64-pc-linux-gnu-library/4.0/Rcpp/include
in Ubuntu 18.04.5 LTS.
R packages
Listing some R packages help write native code.
profmem
Profile the memory usage of R expressions.
profmem::profmem(rnorm(1e4))
# library(mmy)
# profmem::profmem(ht(iris))
The profmem package provides a friendlier alternative to
utils::Rprofmem()
, which is actually used by the package under the
hood, to profile the memory usage of R expressions.
The introductory vignette explains more.
Bengtsson H (2020). profmem: Simple Memory Profiling for R. R package version 0.6.0, https://cran.r-project.org/package=profmem.
microbenchmark
For profiling R code, timing expressions.
microbenchmark::microbenchmark(
base = stats::aggregate(list(mean = iris$Sepal.Length), by = iris["Species"]), mean),
dplyr = iris %>% dplyr::group_by(Species) %>% dplyr::summarise(mean = mean(Sepal.Length)),
check = "identical"
)
The check
argument in this function, which performs checks between the
supplied arguments, is NULL
by default. It can also take a value
"equal"
. See check
argument documentation
?microbenchmark::microbenchmark
for more information.
The benchmark result object can be be visualized by using
ggplot2::autoplot()
.
A more basic alternative to this package in base R can be the
system.time()
function.
Mersmann O (2019). microbenchmark: Accurate Timing Functions. R package version 1.4-7, https://cran.r-project.org/package=microbenchmark.
bench
bench is a modern alternative of microbenchmark and profmem.
n <- 1e5
bench::mark(sample(n), rnorm(10e4), iterations = 100000)
Hester J (2020). bench: High Precision Timing of R Expressions. R package version 1.1.1, https://cran.r-project.org/package=bench.
inline
The inline package is for running compiled code.
The package has useful cfunction()
, cxxfunction()
and
rcpp()
calls are for running C, C++ and Rcpp code respectively.
The example code below is taken from Writing R Extensions manual, from its 5.10.1 Calling .Call section.
convolve2 <- inline::cfunction(c(a = "numeric", b = "numeric"), "
int na, nb, nab;
double *xa, *xb, *xab;
SEXP ab;
a = PROTECT(coerceVector(a, REALSXP));
b = PROTECT(coerceVector(b, REALSXP));
na = length(a); nb = length(b); nab = na + nb - 1;
ab = PROTECT(allocVector(REALSXP, nab));
xa = REAL(a); xb = REAL(b); xab = REAL(ab);
for(int i = 0; i < nab; i++) xab[i] = 0.0;
for(int i = 0; i < na; i++)
for(int j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j];
UNPROTECT(3);
return ab;
")
convolve2(3, 8)
Sklyar O, Murdoch D, Smith M, Eddelbuettel D, Francois R, Soetaert K, Ranke J (2020). inline: Functions to Inline C, C++, Fortran Function Calls from R. R package version 0.3.17, https://cran.r-project.org/package=inline.
cbuild
A modern version of the inline package that allows using multiple functions in the same place and picking what to export.
library(cbuild)
fns <- source_code("
static SEXP helper(SEXP x) {
return x;
}
// [[ export() ]]
SEXP fn1(SEXP x) {
return helper(x);
}
// [[ export() ]]
SEXP fn2(SEXP x, SEXP y) {
double result = REAL(x)[0] + REAL(y)[0];
return Rf_ScalarReal(result);
}
")
fns$fn1(1)
#> [1] 1
fns$fn2(1, 2)
#> [1] 3
Vaughan D (2021). cbuild: Tools to Make Developing R Packages Interfacing with 'C' Easier. R package version 0.0.0.9000, https://github.com/DavisVaughan/cbuild.
lookup
Lookup R full function definitions, including compiled code, S3 and S4 methods.
Search in GitHub, cran user, which is a bot automatically updating results from CRAN to GitHub:
lookup::lookup_usage("grep")
lookup::lookup_usage("length", language = "C++")
lookup::lookup_usage("R_NamesSymbol", user = NULL, language = "C")
Hester J, Wickham H, Csárdi G (2021). lookup: Lookup R function definitions, including compiled code, S3 and S4 methods. R package version 0.0.0.9000.
pkgbuild
pkgbuild package has tools to write compiled code.
See default compiler flags used by devtools
pkgbuild::compiler_flags()
Register native routines
pkgbuild::compile_dll(force = TRUE, register_routines = TRUE)
This function wraps
tools::package_native_routine_registration_skeleton()
under the hood
but provides a cleaner result.
Temporarily set debugging compilation flags
pkgbuild::with_debug(code = "mmy::ht(iris, 2)", debug = TRUE)
When debugging, it’s annoying not being able to traverse in the code
step-by-step because of the compiler optimizations. Therefore, it's a good idea
to consider debug builds (as Kevin Ushey suggests
here)) to add the -g -O0
flag to the
Makevars.
Jim Hester says "it's just a thin wrapper around
withr::with_makevars()
",
here.
On the other hand, it is also useful to create a ~/.R/Makevars
file
and add debug flags there. PKG_CFLAGS=-g O0
for C and
CXXFLAGS=-g O0
for C++.
Wickham H, Hester J (2020). pkgbuild: Find Tools Needed to Build R Packages. R package version 1.2.0, https://cran.r-project.org/package=pkgbuild.
pryr
pryr provides tools to pry back the surface of R and dig into the details.
pryr has a lot of nice features to inspect what is going on inside R.
L <- list(a = 1, b = 2)
pryr::sexp_type(L)
That call is an alternative to gc()
. The call below returns the total
amount of memory in use (in megabytes).
pryr::mem_used()
Those examples below are taken from ?pryr::sexp_type
.
x <- 1:10
pryr::typename(x)
pryr::refs(x)
pryr::address(x)
y <- 1L
pryr::typename(y)
z <- list(1:10)
delayedAssign("a", 1 + 2)
pryr::typename(a)
a
pryr::typename(a)
x <- 1:5
pryr::address(x)
x[1] <- 3L
pryr::address(x)
Although pryr::inspect
can be called as the modernized version of
.Internal(inspect(x))
, they don’t produce the same output. The
internal inspect call gives more detailed results, especially regarding
the memory details of input object(s).
There’s a great post, The Secret Lives of R
Objects
deepening down .Internal(inspect(x))
.
See also Notes on Reference Counting in R by Luke Tierney http://developer.r-project.org/Refcnt.html
L <- list(a = 1, b = list(c = 2, d = 3))
.Internal(inspect(1))
.Internal(inspect(L))
pryr::inspect(1)
pryr::inspect(L)
Wickham H (2018). pryr: Tools for Computing on the Language. R package version 0.1.4, https://cran.r-project.org/package=pryr.
R-hub
The R-hub builder is a multi-platform build and check service for R packages.
R-hub contains a set of images from different architectures:
that can be used to check your package against.
Normally, you’ll need to register there (it’s free) to run your package in the R-hub servers. It’s also possible to pull application layers and run locally by using Docker.
Here's a tutorial how to do it: https://r-hub.github.io/rhub/articles/local-debugging.html
- Run
rchk
locally:
Rscript --vanilla -e 'rhub::local_check_linux(path = ".", image = "rhub/ubuntu-rchk")'
Csárdi G, Salmon M (2019). rhub: Connect to 'R-hub'. R package version 1.1.1, https://cran.r-project.org/package=rhub.
compiler
Byte Code Compiler for R.
-
Since R
3.4.0
the functions are byte-compiled, and since R3.5.0
the packages are also byte compiled. You no need to worry about it. -
Set
ByteCompile: true
inDESCRIPTION
file of the package. That will make functions byte-compiled. However, after R version3.4.x
that comes by default. -
Disable byte-code compiler by
compiler::enableJIT(0)
before profiling (such as using profvis) as it clutters the stack frame.
norm_sum <- function(i) sum(rnorm(i))
cmp <- compiler::cmpfun(norm_sum)
## a minimal disassembler primarily useful for debugging the compiler.
ds <- compiler::disassemble(cmp)
Resources:
-
Tierney, L. (2019). A Byte Code Compiler for R. http://homepage.cs.uiowa.edu/~luke/R/compiler/compiler.pdf
-
Gillespie, C., Lovelace, R. (2020) https://csgillespie.github.io/efficientR/programming.html#the-byte-compiler
Other packages
Some of the packages don’t yet seem to have a stable API, or decided to stay experimental.
-
mrc-ide/odin A DSL working with generating C code.
-
seven31 Discover IEEE 754 double-precision binary floating-point format in R, see FAQ 7.31.
Native packages
RInside
The RInside package provides C++ classes that make it easier to embed R in C++ code (…)
eddelbuettel/rinside allows to use R code inside a C/C++ code.
Some resources about RInside:
- Using Rcpp in Xcode. (2019, March 13). Retrieved May 12, 2019, from https://www.gormanalysis.com/blog/using-rcpp-in-xcode/
Memory
Memory management
UNPROTECT
can also come after the object is initialized, not necessarily at the end of call. For instance:
SEXP out = PROTECT(Rf_allocVector(INTSXP, x_len));
UNPROTECT(1);
Memory checking
Two main popular tools for checking memory are:
-
valgrind Checks memory leaks and helps do memory management.
-
rhck Keep track of the stack of
PROTECT
/UNPROTECT
calls.
See R-hub section for an easy use of these tools.
valgrind
If you’re on macOS Mojave 10.14.3
that valgrind isn’t supported yet
(as per May 2019), you need to set up a virtual machine or a server.
Also take a look at R-hub which has a valgrind image.
Use valgrind with gctorture()
[^#gctorture-ref]:
$ R --debugger=valgrind --silent
Enable gctorture
which forces garbace collection to allocate/delocate
at every step.
gctorture(TRUE)
<your_problematic_call>(...)
Since gctorture
makes R code very slow, it may be better to follow
such structure to prevent accidental garbage collector tortures:
torture <- function(...) {
gctorture(TRUE)
on.exit(gctorture(FALSE), add = TRUE)
# <your_problematic_call>(...)
}
See also:
rchk
-
Tame the R dragon, best to catch the memory errors with R’s memory management declarations, which is often caused by the imbalance of
PROTECT
/UNPROTECT
calls. -
rchk has a Docker and a Vagrant image (see here). Also check R-hub as an alternative solution.
-
rchk may not work well with C++ and Rcpp. Rcpp has features on garbage collectors setting and freeing up memory automatically.
Debugging & Profiling
GDB and LLDB
Start a debugging session with:
R -d gdb # or lldb
Resources:
-
Debugging compiled code in R packages https://r-pkgs.org/src.html#src-debugging
-
Debugging R package with LLDB https://blog.metinyazici.org/posts/debugging-r-lldb
-
*Docker image for debugging *R* memory problems* having a set of tools help you find the memory errors in your native code https://github.com/wch/r-debug.
-
Debugging an R Package with C++ https://blog.davisvaughan.com/2019/04/05/debug-r-package-with-cpp/
-
Debugging and Fixing CRAN's 'Additional Checks' errors https://reside-ic.github.io/blog/debugging-and-fixing-crans-additional-checks-errors/
-
heckendorfc/lldbR R interface for the LLDB debugger (Looks still experiemental)
callgrind
You can run callgrind within R with debug flag -d
.
R -d "valgrind --tool=callgrind" -f file.R
kcachegrind callgrind.out.18133
jointprof
Joint profiling of native and R code
profvis should be the to-go package for profiling R code (but no native code).
(the project is funded by R Consortium)
- Introduction vignette https://r-prof.github.io/jointprof/articles/jointprof.html
Other:
-
Profiling C code in R (with Google’s gperftools) https://stackoverflow.com/questions/40343773/profiling-c-code-r/40360645#40360645 (Referencing this presentation http://dirk.eddelbuettel.com/papers/ismNov2009introHPCwithR.pdf)
-
Profiling Rcpp packages http://minimallysufficient.github.io/r/programming/c++/2018/02/16/profiling-rcpp-packages.html
Tooling
- An Autoconf Primer for R Package Authors https://unconj.ca/blog/an-autoconf-primer-for-r-package-authors.html
Performance
- Persistent R Objects in C https://blog.davisvaughan.com/2019/08/13/persistant-r-objects-in-c/
Create shared R objects persistent during a single R session, e.g. here is an excerpt from the post showing the example of using
shared_int_one
:
SEXP result = PROTECT(Rf_allocVector(VECSXP, 1));
// can use `shared_int_one` without creating a new one
SET_VECTOR_ELT(result, 0, shared_int_one);
UNPROTECT(1); // only have to care about `result` protection
return result;
Calling the native routines between R packages
This is basically done via R_RegisterCCallable
.
R’s C API notes and tips
There are some recorded notes of me about the C API. They come to existence when I try to dance with R internals.
-
Use
R_alloc()
instead ofmalloc()
.Because any asked memory will be allocated within the same running R process(?) See discussion R alloc vs malloc thread and Transient storage allocation section.
-
Since R’s memory reuse feature, there’re no multiple memory allocations done here although R copies objects by value.
profmem::profmem({
x <- rnorm(1e5)
y <- x
z <- y
})
See R_allocOrReuseVector
here.
The information is from
here.
Rboolean R_compute_identical(SEXP, SEXP, int)
: C version of theidentical()
function in R
Other resources
-
The R C API. Notes from a MORON. https://raw-r.org/R_API.php
-
Falcon, Seth. (20-21 May, 2010) Native Interfaces for R. https://www.bioconductor.org/help/course-materials/2010/AdvancedR/NativeInterfaces.pdf
-
Søren Højsgaard. (November 9, 2012) Interfacing C code from R. Aalborg University, Denmark. http://people.math.aau.dk/~sorenh/teaching/2012-ASC/day4/interfaceC-notes.pdf
-
R.M. Ripley. (2008/9). Calling other languages from R. University of Oxford, Department of Statistics. https://www.stat.purdue.edu/~liu105/STAT598G_lab/Rcourse94up.pdf
-
Vanderbilt University, Department of Statistics. (2006). R Internals: SEXP and Memory Allocation. https://biostat.app.vumc.org/wiki/Main/RInternals
-
Creating a data.frame in C https://coolbutuseless.github.io/2020/09/16/creating-a-data.frame-in-c/
Good source about the R internals, some of them may be dated, but many of them are still in use:
- Dalgaard, Peter. (May 2004). Language Interfaces: .Call and .External First UseR! Conference Vienna. http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Dalgaard.pdf
Rcpp
Advantages:
- RAII over manual
memory management (with
PROTECT
/UNPROTECT
)
Resources:
-
Source for the Unofficial Rcpp API Documentation coatless/rcpp-api
-
CppCon 2015: Matt P. Dziubinski “Rcpp: Seamless R and C++ Integration”
-
fdrennan/dcpp - Where I learn to build R code with C++
-
Rcpp for everyone https://teuder.github.io/rcpp4everyone_en/
C for R users
-
C for R Users. Get “Closer to the Machine…” https://raw-r.org/C_R.php
-
Object-oriented programming with ANSI-C: https://www.cs.rit.edu/~ats/books/ooc.pdf
- Rcpp package: https://cran.r-project.org/package=Rcpp↩