Resources for using native code in R
Metin Yazici
2019-03-21 - 3 years ago
12 min read

The content explained here is compiled from different resources in which they are referenced where applicable.

This content is mainly about using native C code in R. The content of C++ and Rcpp1 is also discussed and added here as resources; however, they are not in the primary focus. Other languages such as Fortran, Java, Rust etc. are not included to this post at all.

General resources and strategies to find new resources

Header paths

  • For R internals (residing in R.h and Rinternals.h), you can run the code below to find the paths:
Rscript -e 'file.path(R.home(), "include")'

which returns /Library/Frameworks/R.framework/Resources/include in OS X and /usr/lib/R in Debian/Ubuntu.

  • For specific package headers such as Rcpp, use the call:
pkg_headers <- function(package) {
  ip <- installed.packages()
  file.path(ip[match(package, ip), "LibPath"], x, "include")
}
pkg_headers("Rcpp")

returning ~/R/x86\_64-pc-linux-gnu-library/4.0/Rcpp/include in Ubuntu 18.04.5 LTS.

R packages

Listing some R packages help write native code.

profmem

Profile the memory usage of R expressions.

profmem::profmem(rnorm(1e4))
# library(mmy)
# profmem::profmem(ht(iris))

The profmem package provides a friendlier alternative to utils::Rprofmem(), which is actually used by the package under the hood, to profile the memory usage of R expressions.

The introductory vignette explains more.

Bengtsson H (2020). profmem: Simple Memory Profiling for R. R package version 0.6.0, https://cran.r-project.org/package=profmem.

microbenchmark

For profiling R code, timing expressions.

microbenchmark::microbenchmark(
  base = stats::aggregate(list(mean = iris$Sepal.Length), by = iris["Species"]), mean),
  dplyr = iris %>% dplyr::group_by(Species) %>% dplyr::summarise(mean = mean(Sepal.Length)),
  check = "identical"
)

The check argument in this function, which performs checks between the supplied arguments, is NULL by default. It can also take a value "equal". See check argument documentation ?microbenchmark::microbenchmark for more information.

The benchmark result object can be be visualized by using ggplot2::autoplot().

A more basic alternative to this package in base R can be the system.time() function.

Mersmann O (2019). microbenchmark: Accurate Timing Functions. R package version 1.4-7, https://cran.r-project.org/package=microbenchmark.

bench

bench is a modern alternative of microbenchmark and profmem.

n <- 1e5
bench::mark(sample(n), rnorm(10e4), iterations = 100000)

Hester J (2020). bench: High Precision Timing of R Expressions. R package version 1.1.1, https://cran.r-project.org/package=bench.

inline

The inline package is for running compiled code. The package has useful cfunction(), cxxfunction() and rcpp() calls are for running C, C++ and Rcpp code respectively.

The example code below is taken from Writing R Extensions manual, from its 5.10.1 Calling .Call section.

convolve2 <- inline::cfunction(c(a = "numeric", b = "numeric"), "
    int na, nb, nab;
    double *xa, *xb, *xab;
    SEXP ab;
    a = PROTECT(coerceVector(a, REALSXP));
    b = PROTECT(coerceVector(b, REALSXP));
    na = length(a); nb = length(b); nab = na + nb - 1;
    ab = PROTECT(allocVector(REALSXP, nab));
    xa = REAL(a); xb = REAL(b); xab = REAL(ab);
    for(int i = 0; i < nab; i++) xab[i] = 0.0;
    for(int i = 0; i < na; i++)
        for(int j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j];
    UNPROTECT(3);
    return ab;
")
convolve2(3, 8)

Sklyar O, Murdoch D, Smith M, Eddelbuettel D, Francois R, Soetaert K, Ranke J (2020). inline: Functions to Inline C, C++, Fortran Function Calls from R. R package version 0.3.17, https://cran.r-project.org/package=inline.

cbuild

A modern version of the inline package that allows using multiple functions in the same place and picking what to export.

library(cbuild)
fns <- source_code("
  static SEXP helper(SEXP x) {
    return x;
  }

  // [[ export() ]]
  SEXP fn1(SEXP x) {
    return helper(x);
  }

  // [[ export() ]]
  SEXP fn2(SEXP x, SEXP y) {
    double result = REAL(x)[0] + REAL(y)[0];
    return Rf_ScalarReal(result);
  }
")

fns$fn1(1)
#> [1] 1

fns$fn2(1, 2)
#> [1] 3

Vaughan D (2021). cbuild: Tools to Make Developing R Packages Interfacing with 'C' Easier. R package version 0.0.0.9000, https://github.com/DavisVaughan/cbuild.

lookup

Lookup R full function definitions, including compiled code, S3 and S4 methods.

Search in GitHub, cran user, which is a bot automatically updating results from CRAN to GitHub:

lookup::lookup_usage("grep")
lookup::lookup_usage("length", language = "C++")
lookup::lookup_usage("R_NamesSymbol", user = NULL, language = "C")

Hester J, Wickham H, Csárdi G (2021). lookup: Lookup R function definitions, including compiled code, S3 and S4 methods. R package version 0.0.0.9000.

pkgbuild

pkgbuild package has tools to write compiled code.

See default compiler flags used by devtools

pkgbuild::compiler_flags()

Register native routines

pkgbuild::compile_dll(force = TRUE, register_routines = TRUE)

This function wraps tools::package_native_routine_registration_skeleton() under the hood but provides a cleaner result.

Temporarily set debugging compilation flags

pkgbuild::with_debug(code = "mmy::ht(iris, 2)", debug = TRUE)

When debugging, it’s annoying not being able to traverse in the code step-by-step because of the compiler optimizations. Therefore, it's a good idea to consider debug builds (as Kevin Ushey suggests here)) to add the -g -O0 flag to the Makevars.

Jim Hester says "it's just a thin wrapper around withr::with_makevars()", here.

On the other hand, it is also useful to create a ~/.R/Makevars file and add debug flags there. PKG_CFLAGS=-g O0 for C and CXXFLAGS=-g O0 for C++.

Wickham H, Hester J (2020). pkgbuild: Find Tools Needed to Build R Packages. R package version 1.2.0, https://cran.r-project.org/package=pkgbuild.

pryr

pryr provides tools to pry back the surface of R and dig into the details.

pryr has a lot of nice features to inspect what is going on inside R.

L <- list(a = 1, b = 2)
pryr::sexp_type(L)

That call is an alternative to gc(). The call below returns the total amount of memory in use (in megabytes).

pryr::mem_used()

Those examples below are taken from ?pryr::sexp_type.

x <- 1:10
pryr::typename(x)
pryr::refs(x)
pryr::address(x)
y <- 1L
pryr::typename(y)
z <- list(1:10)
delayedAssign("a", 1 + 2)
pryr::typename(a)
a
pryr::typename(a)
x <- 1:5
pryr::address(x)
x[1] <- 3L
pryr::address(x)

Although pryr::inspect can be called as the modernized version of .Internal(inspect(x)), they don’t produce the same output. The internal inspect call gives more detailed results, especially regarding the memory details of input object(s).

There’s a great post, The Secret Lives of R Objects deepening down .Internal(inspect(x)).

See also Notes on Reference Counting in R by Luke Tierney http://developer.r-project.org/Refcnt.html

L <- list(a = 1, b = list(c = 2, d = 3))
.Internal(inspect(1))
.Internal(inspect(L))
pryr::inspect(1)
pryr::inspect(L)

Wickham H (2018). pryr: Tools for Computing on the Language. R package version 0.1.4, https://cran.r-project.org/package=pryr.

R-hub

The R-hub builder is a multi-platform build and check service for R packages.

R-hub contains a set of images from different architectures:

that can be used to check your package against.

Normally, you’ll need to register there (it’s free) to run your package in the R-hub servers. It’s also possible to pull application layers and run locally by using Docker.

Here's a tutorial how to do it: https://r-hub.github.io/rhub/articles/local-debugging.html

  • Run rchk locally:
Rscript --vanilla -e 'rhub::local_check_linux(path = ".", image = "rhub/ubuntu-rchk")'

Csárdi G, Salmon M (2019). rhub: Connect to 'R-hub'. R package version 1.1.1, https://cran.r-project.org/package=rhub.

compiler

Byte Code Compiler for R.

  • Since R 3.4.0 the functions are byte-compiled, and since R 3.5.0 the packages are also byte compiled. You no need to worry about it.

  • Set ByteCompile: true in DESCRIPTION file of the package. That will make functions byte-compiled. However, after R version 3.4.x that comes by default.

  • Disable byte-code compiler by compiler::enableJIT(0) before profiling (such as using profvis) as it clutters the stack frame.

norm_sum <- function(i) sum(rnorm(i))
cmp <- compiler::cmpfun(norm_sum)
## a minimal disassembler primarily useful for debugging the compiler.
ds <- compiler::disassemble(cmp)

Resources:

Other packages

Some of the packages don’t yet seem to have a stable API, or decided to stay experimental.

Native packages

RInside

The RInside package provides C++ classes that make it easier to embed R in C++ code (…)

eddelbuettel/rinside allows to use R code inside a C/C++ code.

Some resources about RInside:

Memory

Memory management

  • UNPROTECT can also come after the object is initialized, not necessarily at the end of call. For instance:
SEXP out = PROTECT(Rf_allocVector(INTSXP, x_len));
UNPROTECT(1);

Memory checking

Two main popular tools for checking memory are:

  • valgrind Checks memory leaks and helps do memory management.

  • rhck Keep track of the stack of PROTECT/UNPROTECT calls.

See R-hub section for an easy use of these tools.

valgrind

If you’re on macOS Mojave 10.14.3 that valgrind isn’t supported yet (as per May 2019), you need to set up a virtual machine or a server. Also take a look at R-hub which has a valgrind image.

Use valgrind with gctorture()[^#gctorture-ref]:

$ R --debugger=valgrind --silent

Enable gctorture which forces garbace collection to allocate/delocate at every step.

gctorture(TRUE)
<your_problematic_call>(...)

Since gctorture makes R code very slow, it may be better to follow such structure to prevent accidental garbage collector tortures:

torture <- function(...) {
  gctorture(TRUE)
  on.exit(gctorture(FALSE), add = TRUE)
  # <your_problematic_call>(...)
}

See also:

rchk

kalibera/rchk

  • Tame the R dragon, best to catch the memory errors with R’s memory management declarations, which is often caused by the imbalance of PROTECT/UNPROTECT calls.

  • rchk has a Docker and a Vagrant image (see here). Also check R-hub as an alternative solution.

  • rchk may not work well with C++ and Rcpp. Rcpp has features on garbage collectors setting and freeing up memory automatically.

Debugging & Profiling

GDB and LLDB

Start a debugging session with:

R -d gdb # or lldb

Resources:

callgrind

You can run callgrind within R with debug flag -d.

R -d "valgrind --tool=callgrind" -f file.R
kcachegrind callgrind.out.18133

jointprof

Joint profiling of native and R code

profvis should be the to-go package for profiling R code (but no native code).

(the project is funded by R Consortium)

Other:

Tooling

Performance

Create shared R objects persistent during a single R session, e.g. here is an excerpt from the post showing the example of using shared_int_one:

SEXP result = PROTECT(Rf_allocVector(VECSXP, 1));

// can use `shared_int_one` without creating a new one
SET_VECTOR_ELT(result, 0, shared_int_one);

UNPROTECT(1); // only have to care about `result` protection
return result;

Calling the native routines between R packages

This is basically done via R_RegisterCCallable.


R’s C API notes and tips

There are some recorded notes of me about the C API. They come to existence when I try to dance with R internals.

  • Use R_alloc() instead of malloc().

    Because any asked memory will be allocated within the same running R process(?) See discussion R alloc vs malloc thread and Transient storage allocation section.

  • Since R’s memory reuse feature, there’re no multiple memory allocations done here although R copies objects by value.

profmem::profmem({
  x <- rnorm(1e5)
  y <- x
  z <- y
})

See R_allocOrReuseVector here. The information is from here.

  • Rboolean R_compute_identical(SEXP, SEXP, int): C version of the identical() function in R

Other resources

Good source about the R internals, some of them may be dated, but many of them are still in use:

Rcpp

Advantages:

  • RAII over manual memory management (with PROTECT/UNPROTECT)

Resources:

C for R users


  1. Rcpp package: https://cran.r-project.org/package=Rcpp