Menu Home

In defense of wrapr::let()

Saw this the other day:

Wraprvstidyeval

In defense of wrapr::let() (originally part of replyr, and still re-exported by that package) I would say:

  • let() was deliberately designed for a single real-world use case: working with data when you don’t know the column names when you are writing the code (i.e., the column names will come later in a variable). We can re-phrase that as: there is deliberately less to learn as let() is adapted to a need (instead of one having to adapt to let()).
  • The R community already has months of experience confirming let() working reliably in production while interacting with a number of different packages.
  • let() will continue to be a very specific, consistent, reliable, and relevant tool even after dpyr 0.6.* is released, and the community gains experience with rlang/tidyeval in production.

If rlang/tidyeval is your thing, by all means please use and teach it. But please continue to consider also using wrapr::let(). If you are trying to get something done quickly, or trying to share work with others: a “deeper theory” may not be the best choice.

An example follows.

In “base R” one can write:

d <- data.frame(x = 1:3)

If we know the column name we wish to add to this data frame we write:

d$newColumn <- 1

The above is “non-parameterized” evaluation in that the variable name is taken from the source code, and not from a variable carrying the information. This is great for ad-hoc analysis, but it would be hard to write functions, scripts and packages if this was the only way we had to manipulate columns.

This isn’t a problem as R supplies an additional parameterized notation as we show here. When we don’t know the name of the column (but expect it to be in a variable at run time) we write:

# code written very far away from us
variableHoldingColumnName <- 'newColumn' 

# our code
d[[variableHoldingColumnName]] <- 1

The purpose of wrapr::let() is to allow one to use the non-parameterized form as if it were parameterized. Obviously this is only useful if there is not a convenient parameterized alternative (which is the case for some packages). But for teaching purposes: how would wrapr::let() let us use the “$as if it were parameterized (which we would have to do if [[]] and [] did not exist)?

With wrapr::let() we can re-write the “dollar-sign form” as:

wrapr::let(
   c(NEWCOL = variableHoldingColumnName),
   {
     d$NEWCOL <- 1
   }
)

The name “NEWCOL” is a stand-in name that we write all our code in terms of. The expression “c(NEWCOL = variableHoldingColumnName)” is saying: “NEWCOL is to be interpreted as being whatever name is stored in the variable variableHoldingColumnName.” Notice we can’t tell from the code what value is in variableHoldingColumnName, as that won’t be known until execution time. The alias list or vector can be arbitrarily long and built wherever you like (it is just data). The expression block can be arbitrarily large and complex (so you need only pay the mapping notation cost once per function, not once per line of code).

And that is wrapr::let().

If you don’t need parametric names you don’t need wrapr::let(). If you do need parametric names wrapr::let() is one of the most reliable and easiest to learn ways to get them.

A nice article from a user recording their experience trying different ways to parameterize their code (including wrapr::let()) can be found here.

If you wish to learn more we have a lot of resources available:

And many more examples on our blog.

Categories: Coding Opinion

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

7 replies

  1. I’m glad to hear that you’re going to keep supporting wrapr::let. Personally, I strongly suspect that I will stick to it in most cases, as I feel it is much easier to grok. With that said, I appreciate the fact that there are several good solutions to this problem.

    1. I think “several good solutions” is how the R community is supposed to work (not monolithic take it or leave super packages). So I definitely will support wrapr::let() going forward, and I have no problem with other solutions.

  2. wrapr::let() is in fact based on string substitution ( help(let, package='wrapr') admits to this). I agree with the worry that this is a potentially dangerous situation. That is why many mitigating and steps are taken in the package to try and minimize risk.

    In its default mode wrapr::let() only accepts both substitution targets (the tokens being replaced) and substitution values (the tokens showing up as a replacement) that are strict valid R variable names. In addition we do not allow “.” (as it is a bit special) or funny quoted names. We are trying to make the “Bobby tables” situation very hard to get into. This is one of the reasons we map names to names, never names to values.
    wrapr::let() substitution targets (the tokens being replaced) are all picked by the programmer at coding time. We suggest using the convention of ALL_CAPS_SIMPLE_NAMES to make those targets easy to spot (both for the human and the machine). Only the substitution values change at run-time, not the targets. We also insist on word-boundaries for substitution to try and improve substitution safety.
    The substituted expression is of both limited lifetime and visibility. It is evaluated immediately after formation and disposed of. This may not seem important but limited lifetime and visibility are standard ways to increase safety (for example: object oriented languages depend critically on privacy of members).

    Beyond substitution another risk-area for wrapr::let() is directly picking execution environments. wrapr::let() attempts to execute in the caller’s environment. wrapr::let() does far less explicit environment munging than many other packages considered “safe” and deliberately avoids capturing, saving, and transporting environments (its use of environments is again private and transient).

    Roughly: wrapr::let() consumes some referential transparency by directly messing with code and environments. For code to work we have to not run out of referential transparency (some of the code has to really mean what it appears to say at some point). We have tried very hard to have wrapr::let() consume as little referential transparency as possible to achieve the needed effects. However, it is always possible to run out of referential transparency in the presence of other code or environment controlling packages that consume referential transparency (though it is then a philosophical question which package is a at fault if two packages don’t work well together).

    We don’t go into all of this in the introduction to the package as wrapr::let() is designed to be beginner friendly, and explaining all of these risks and the engineering principles used to mitigate them would be quite off-putting.

    1. Actually our group is very pro-dplyr, especially when used to control an external system like PostgreSQL or Spark/Sparklyr.

%d