Menu Home

C++ is Often Used in R Packages

The recent r-project article “Use of C++ in Packages” stated as its own summary of recommendation:

don’t use C++ to interface with R.

A careful reading of the article exposes at least two possible meanings of this:

  1. Don’t use C++ to directly call R or directly manipulate R structures. A technical point directly argued (for right or wrong) in the article.
  2. Don’t use C++/Rcpp to write R packages. A point implicit in the article. C++ and Rcpp (a package designed to allow the use of C++ from R) are not the same thing, but both are mentioned in the note.

One could claim the article is “all about point 1, which we can argue on its technical merits.” The technicalities involve discussion of C‘s setjmp/longjmp and how this differs from C++‘s treatment of RAII, destructors, and exceptions.

(edit: It has been pointed out to me that as there is no C++ interface to R that the point-1 interpretation is in some sense not technically possible. All C++ is in some sense forced to go through the C interface. Yes things can go wrong, but in strict technical sense you can’t directly “use C++ to interface with R“, C++ calls .C() or .Call() just as C does.)

However, in my opinion the overall tone of the article unfortunately reads as being about point 2. In fact after multiple readings of the article I remain uncomfortable saying if the article is in fact attempting to make point 2 or attempting to avoid point 2. Statements such as “Packages that are already using C++ would best be carefully reviewed and fixed by their authors” seem to accuse all existing C++ packages. But statements such as “one could use some of the tricks I’ve described here” seem to imply there are in fact correct ways to interface C++ with R (which for all we know, many C++ packages may already be using).

I think a point 2 interpretation of the article does the R community a disservice. So I hope the note is not in fact about point 2. And if it isn’t about point 2, I wish that had been stronger emphasized and made clearer.

For context Rcpp is the most popular package on CRAN. Based on CRAN data downloaded 2019/03/31: Rcpp is directly used in 1605 CRAN packages (or about 11% of CRAN packages), and indirectly used (brought in through Import/Depends/LinkingTo) by 6337 packages (or about 45% of CRAN packages). It has the highest reach of any CRAN package under each of those measures (calculation shared here), and even under a pagerank style measure.

Rcpp is something R users should be appreciative of and grateful for. Rcpp should not become the subject of fear, uncertainty, and doubt.

I apologize if I am merely criticizing my own mis-reading of the note. However, others have also written about discomfort with this note, and the original note comes from a position of authority (so does have a greater responsibility to be fairly careful in how it might be plausibly read).

Categories: Opinion

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

6 replies

  1. Wow, that post was amazingly inflammatory. I have the same reading as you. He is essentially saying that Rcpp should never be used and to only ever use .C with C++. It never even mentions Rcpp, which is crazy for an article about interfacing C++ and R. The article is on r-project and written by a core member. What the heck is going on.

    1. Technically it uses to word Rcpp five times, but to your point: never a positive mention. Also the whole “I found calling the C++ based LLVM from C difficult” also seems contrary to the point of the article.

      1. I’m just really confused by this. I use Rcpp extensively in packages, but am much more of an algorithmic programmer and thus don’t have a deep understanding of RAII or longjmp. When exactly do I need to worry about these issues given most of the interfaces with the R API are hidden behind syntactic sugar?

        The one concrete example that he gives is RNGScope potentially destroying an unprotected SEXP. This is similar to an issue that I’ve encountered before, and my policy is to now only ever return RObject objects from functions or methods. I think there is some recognition generally that you have to be super careful about your raw SEXPs as they could be collected any time you ask R to do anything.

        He also says the main issue is resolvable via conversion to RAII:

        “This conversion is possible using R_UnwindProtect, see Writing R Extensions 6.12, but requires some verbose coding/boiler-plate. Rcpp uses this API.”

        Since Rcpp converts, is this actually a problem, or just something that comes up if you hit the R API directly?

        I hope Dirk writes something up to help clarify.

      2. I don’t have the C++/Rcpp background to say anything definitive about how C++/Rcpp truly interact with R.

        But, the original note is incredibly confusing.

        For instance it says what you quote.

        This conversion is possible using R_UnwindProtect, see Writing R Extensions 6.12, but requires some verbose coding/boiler-plate. Rcpp uses this API.

        Is the above meant to imply one is largely safe if one uses Rcpp or the opposite? I think the facts are one is largely safe if one uses Rcpp sensibly (keeping R operations clean, and C++ operations among themselves), but it is hard to tell from the article.

        The original article goes on to say:

        Note that using Rcpp does not release package authors from thinking about these problems: indeed with Rcpp one can still call R API directly, but even if that is avoided, one can introduce PROTECT errors by incorrectly using existing objects (like the RNGScope example), by introducing complicated destructors of their own objects (allocating R API call from a destructor) or cause a memory leak by allocating memory dynamically without thinking about exceptions.

        Does this mean one isn’t safe when using Rcpp, or one has to work hard to dig a hole when using Rcpp?

        And this one:

        One cannot easily guess which R API functions may long jump, also this may change between R versions without notice.

        My wild guess as to a hard to guess situation that may trigger a long jump would be: reading values without making an index error. Prior to ALTREP this probably could not long jump, and after the introduction of ALTREP it may be more subtle than that. Any any case, one would think which R APIs can long jump would be something one would want documented at the API (lowering the need to guess; exceptions are simpler than this and they are often declared/documented).

        Is the original article just genuine frustration with working with helping others debug C++/R interactions? Or is it “negging” to get some future concession from Rcpp? Or is it preparation marketing for some sort of upcoming “tidyc” package? I can’t honestly tell. I feel forced to make wild speculative guesses as I just can’t get a clear intent from the article that is consistent with my experiences and what I have heard from others. Honestly I like Rcpp, respect Dirk Eddelbuettel and the other developers, and that frankly colors my opinions.

      3. I’ll start with that I’ve never really used Rcpp (never could get it to work on Windows). So please point out (and forgive) anywhere I’m wrong.

        After reading the article a few times, I think his summary should be “don’t use C++ to directly interface with R. (And not even indirectly in destructors).”

        The two problems in a this layman’s nutshell:

        C++ code called the R API directly without making sure destructors always ran, even if a C error happened. This led to memory leaks.
        C++ destructors called the R API directly or indirectly. This sometimes triggered garbage collection and wiped a value before it could be protected from garbage collection.

        Rcpp solves problem #1, but only if you always use it to access the R API. Rcpp can’t help with problem #2. It’s up to every package author to know about and avoid this. From how I read it, it’s never safe to use the R API within a destructor. R will garbage collect when it feels like it. From the documentation (https://cran.r-project.org/doc/manuals/r-release/R-ints.html#The-write-barrier), this is intended to be “opaque.”

        Any case, one would think which R APIs can long jump would be something one would want documented at the API

        C doesn’t natively have destructors like C++. So problem #1 doesn’t necessarily occur for C. R Core probably doesn’t document which functions use setjmp/longjmp because, in C, knowing there could be a warning or error is enough. Documenting how it happens would be cementing the internals for no benefit.

      4. Thanks for the note. You bring up some interesting points.

        I am certainly not the expert, mostly I am writing up my experience and opinion that I would much rather research how to use C++ and Rcpp correctly than to not have the benefits of array index bounds checking.

        Roughly I think you are saying the original article is probably summarized as “point 1.” But people have written me that that is not technically interesting as there is no “C++ interface to R”, all C++ code is already going through C interfaces.

        Also, I think the setjmp/longjmp issue is a bit more pervasive than one might initially think (and not C++’s fault). As Keven Ushey pointed out the following C-code is not safe in the presence of R initiated long-jumps:

        FILE* f = fopen(...);
        Rf_eval(...);
        fclose(f);
        
%d bloggers like this: