Menu Home

Why to use wrapr::let()

I have written about referential transparency before. In this article I would like to discuss “leaky abstractions” and why wrapr::let() supplies a useful (but leaky) abstraction for R programmers.

Wraprs

Abstractions

A common definition of an abstraction is (from the OSX dictionary):

the process of considering something independently of its associations, attributes, or concrete accompaniments.

In computer science this is commonly taken to mean “what something can be thought to do independent of caveats and implementation details.”

The magrittr abstraction

In R one traditionally thinks of the magrittr "%>%" pipe abstractly in the following way:

 Once "library(magrittr)" is loaded we can treat the expression:

   7 %>% sqrt()

 as if the programmer had written:

   sqrt(7)
 .

That is the abstraction of magrittr into terms one can reason about and plan over. You think of x %>% f() as a synonym for f(x). This is an abstraction because magrittr is not in fact implemented as a macro source-code re-write, but in in terms of function argument capture and delayed evaluation. And as Joel Spolsky famously wrote:

All non-trivial abstractions, to some degree, are leaky.

The magrittr pipe is non-trivial (in the sense of doing interesting work) because it works as if it were a syntax replacement even though you can use it more places than you could ask for such a syntax replacement. The upside is: magrittr makes two statements behave nearly equivalently. The downside is: we expect this to fail in some corner cases. This is not a criticism; it is as Bjarne Stroustrup wrote:

There are only two kinds of languages: the ones people complain about and the ones nobody uses.

The tidyeval/rlang abstraction

The package dplyr 0.5.0.9004 brings in a new package called rlang to supply a capability called tidyeval. Among the abstractions it supplies are: operators for quoting and un-quoting variable names. This allows code like the following, where a dplyr::select() takes a variable name from a user supplied variable (instead of the usual explicit take from the text of the dplyr::select() statement).

# devtools::install_github('tidyverse/dplyr')
library("dplyr")
packageVersion("dplyr")
 # [1] ā€˜0.5.0.9004’
varName = quo(disp)
mtcars %>% select(!!varName) %>% head()
 #                   disp
 # Mazda RX4          160
 # Mazda RX4 Wag      160
 # Datsun 710         108
 # Hornet 4 Drive     258
 # Hornet Sportabout  360
 # Valiant            225

Notice in the above example we had to specify the abstract varName by calling quo() on a free variable name (disp) and did not take the value from a string. [updated 2017-05-03] To work with a string contained in another variable the syntax is:

varName <- as.name(colnames(mtcars)[[1]])
mtcars %>% select(!!varName) %>% head()

or:

varName <- rlang::sym(colnames(mtcars)[[1]])
mtcars %>% select(!!varName) %>% head()

The wrapr::let() abstraction

Our wrapr package can abstract the recent example (working over strings instead of “quosure” classes) as follows.

The (leaky) abstraction is:

varName <- 'var'; wrapr::let(VAR=varName, expr(VAR))” is treated as if the user had written expr(var).

This can be also thought of as form of unquoting as you do see one set of quotes disappear.

Let’s try it:

library("wrapr")
x <- 5
varName <- 'x'
VAR <- NULL # make sure macro target does not look like an unbound reference
let(c(VAR=varName), VAR)
 # [1] 5

The NULL assignment is not needed, but adding something like that prevents CRAN style checks from thinking the macro replacement target VAR is an unbound variable in the let block. I’ll leave this out of the later examples for conciseness.

Or moving back to our dplyr::select() example:

varName <- 'disp'
let(
  c(VARNAME = varName),
  mtcars %>% select(VARNAME) %>% head()
)
 #                    disp
 # Mazda RX4          160
 # Mazda RX4 Wag      160
 # Datsun 710         108
 # Hornet 4 Drive     258
 # Hornet Sportabout  360
 # Valiant            225

And wrapr::let() can also conveniently handle the “varName <- colnames(mtcars)[[1]]” case.

An issue

dplyr issue 2726 (reproduced below) discusses a very important and interesting issue.

2726

At a cursory glance the two discussed expressions and the work-around may seem alien, artificial, or even silly:

  1. (function(x) select(mtcars, !!enquo(x)))(disp)
  2. (function(x) mtcars %>% select(!!enquo(x)))(disp)
  3. (function(x) { x <- enquo(x); mtcars %>% select(!!x)})(disp)

However, this is actually a very crisp and incisive example. In fact, if rlang/tidyeval were a system up for public revision (such as a RFC or some such proposal) you would expect the equivalence of the above to be part of an acceptance suite.

The first expression looks very much like rlang/tidyeval package examples and is the “right way” in rlang/tidyeval to send in a column name parametrically. It is in the style preferred by the new package so by the package standards can not be considered complicated, perverse, or verbose. The second expression differs from the first only by the application of the “magrittr invariant” of “x %>% f() is to be considered equivalent to f(x)“.

The outcome is the first expression currently executes as expected, and the second expression errors-out. This can be considered surprising as this is not something anticipated in the documentation or recipes for building up tidy expressions. This is a leak in the combined abstractions, something we are told to back away from as it doesn’t work.

The proposed work-around (expression 3) is helpful, but itself demonstrates another leak in the mutual abstractions. Think of it this way: suppose we had started with expression 3 as working code. We would by referential transparency expect to be able to refactor the code and replace x with its value and move from this third working example to the second expression (which happens to fail).

To summarize: expressions 1 and 3 are equivalent. They differ by two refactoring steps (introduction/removal of pipes, and introduction/removal of a temporary variable). But we can not demonstrate the equivalence by interpolating in 2 named transformations (going from 1 to 2 to 3, or from 3 to 2 to 1) as the intermediate expression 2 is apparently not valid.

The wrapr::let version of the issue author’s desired expression 2 is:

  (function(x) let(c(X = x), mtcars %>% select(X)))('disp')

Conclusion

wrapr::let() is a useful abstraction:

  • It directly takes strings as variable names (the most common source of parametric variable names).
  • It is a marco-like replacement and easy to teach as a code re-writing abstraction.
  • It has a small interaction surface, and plays well with delayed evaluation packages such as magrittr and dplyr 0.5.0.
  • It is “future proof” in the sense it should work with both dplyr 0.5.0 and the coming dply 0.6.*.

Categories: Coding Opinion

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

6 replies

  1. And the Bizarro Pipe version of expression 2 is:

       (function(x) { mtcars ->.; select(., !!enquo(x)) })(disp)
    

    (which works correctly).

    What is going on is the packages magrittr and rlang both currently consume too much referential transparency to be currently considered fully compatible with each other.

    This is related to situations such as dplyr issue 2080 where “dplyr and purrr (or magrittr?) are fighting over what . means”, and magrittr issue 141 which seems to be asking for more deliberate cooperation between the packages (implying coordination problems without such explicit accommodations).

    1. I assume you mean equivalent to the new NSE documentation?

      We do have articles and documentation on wrapr::let() including:

      A vignette: vignette('let', package='wrapr').
      The method help: help('let', package='wrapr')
      Examples: Using replyr::let to Parameterize dplyr Expressions.
      A video lecture: My recent BARUG talk: Parametric Programming in R with replyr.
      Some notes: Parametric Programming in R.
      The package introduction: The wrapr introduction.
      Comparison to the soon to deprecated SE underbar/underscore methods: Comparative examples using replyr::let.

      And many more examples on our blog.

  2. Also I think we should assume the original issue reporter knew of one_of(). Issues reports have to be taken with some trust that they do in fact originate from a meaningful use case, and that prior to being simplified down to an issue report many of the “you could just do x instead” options may not have been available.

  3. I know that I wasn’t the one asked the following question:

    I would say my counter answer is as follows.

    wrapr::let (nee replyr::let) is primarily designed to satisfy a single common use case: substituting a to-be determined name for a column of a data.frame at runtime. It deliberately has more restricted power and helps documents user intent.

    Designing top-down from a use case (instead of designing bottom-up from desired capabilities or systems) keeps the code simple and gives us answers to a lot of design decisions (such as should column names cary environments? The answer being: “no”, as the user has no intent to use the environment that happens to be present when they do specify the column name.)

    I actually have allowed the system to be a bit more generic than just working with column names, and that is largely to make everything more orthogonal or regular (and hence easier for the user to reason about). Hence you can re-map arbitrary variables to other variables as follows:

    x <- 7
    varName <- 'x' # quote(x) would also work
    wrapr::let(c(VAR = varName), VAR+1)
    

    wrapr::let also prohibits a large number of things to deliberately limit the scope of the system and help users find errors much closer to causes. For example wrapr::let only binds names to names, not names to values. For example the following is not allowed:

    newValue <- 7
    wrapr::let(c(VAL = newValue), VAL+1)
    
    #  Error in prepareAlias(alias, FALSE, strict) : 
    #   wrapr:let alias values must all be strings or names 
    

    And this is because R already has much better ways to map names to values (i.e. its standard execution environments) and we don’t want to needlessly displace R‘s built in execution semantics where we do not need to do so.

    If you have the time I suggest watching my screencast on let-substitition. I spend a lot of time defining the use-case and what wrapr::let does, so you can quickly tell if what wrapr::let does, and thus if it solves a problem you have or not. Since wrapr::let doesn’t try to do everything, it is often clear what one is using for (i.e., use can be somewhat self-documenting).

  4. A nice article about dplyr 0.6 rlang/tidyeval can be found here. My quibble is: while one may rightly be uncomfortable with expression capture followed by substitution (as this is not how other languages implement macros), is that expression capture is already used a lot in R (and a very lot in rlang/tidyeval). Also wrapr has been very reliable and stable in production. Whereas rlang/tidyeval has a lot of “expression A works, but A’ does not” issues which are collecting as what are commonly called “closed won’t fix” issues in the dplyr and rlang/tidyeval repositories (I know from earlier interactions that these teams don’t seem to use this terminology theirselves).

    The take-away: wrapr is reliable.

%d bloggers like this: