While going over some of the discussion related to my last post I came up with a really neat way to use wrapr::let()
and rlang
/tidyeval
together.
Please read on to see the situation and example.Suppose we want to parameterize over a couple of names, one denoting a variable coming from the current environment and one denoting a column name. Further suppose we are worried the two names may be the same.
We can actually handle this quite neatly, using rlang
/tidyeval
to denote intent (in this case using “!!
” to specify “take from environment instead of the data frame”) and allowing wrapr::let()
to perform the substitutions.
suppressPackageStartupMessages(library("dplyr")) library("wrapr") mass_col_name = 'mass' mass_const_name = 'mass' mass % transmute(height, (!! MASS_CONST), # `mass` from environment MASS_COL, # `mass` from data.frame h100 = height * (!! MASS_CONST), # env hm = height * MASS_COL # data ) %>% head() ) #> # A tibble: 6 x 5 #> height `(100)` mass h100 hm #> #> 1 172 100 77 17200 13244 #> 2 167 100 75 16700 12525 #> 3 96 100 32 9600 3072 #> 4 202 100 136 20200 27472 #> 5 150 100 49 15000 7350 #> 6 178 100 120 17800 21360
All in all, that is pretty neat.
(Note: rlang
/tidyeval
uses “(!! )
” deference notation in a number of ways, here we are only using it to specify environment, not for substitution.)
Categories: Opinion
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Some new
wrapr::let()
features (found in the development version) includeeval=FALSE
anddebugPrint=TRUE
modes. With these options you can see what would be executed or what is being executed. These are great for learningwrapr::let()
and debugging.For example in our above example we could run:
This results in:
which is exactly the code re-written by
wrapr::let()
has prepared for execution (one can even pass it toeval()
for execution). This is an excellent way to see whatwrapr::let()
does, and work out if it does what you want. The “(!(!mass))
” is just howR
represents(!! mass)
, and as you see executes the same.With
debugPrint=TRUE
wrapr::let()
both prints the replaced expression and then executes as usual.If you want to be very strict (and completely unambiguous) you can use the
.data$
pronoun form to force references to thedata.frame
. We show this below.We do not currently recommend using the pronoun in the form
.data[[my_var]]
. If you use `rlang`/`tidyeval` to perform substitutions *always* write something such as.data[[!!my_var]]
(some details here). This is due to complications described in `dplyr` issues 2904 and 2916.This is one of the reasons we advise using `wrapr::let()` for substitution, even if you are using `rlang`/`tidyeval` (hence why you might end up using them together).
The `rlang`/`tidyeval` substitution issues can be subtle and are possibly why the data-pronoun example in the actual `June 13, 2017 dplyr 0.7.0` announcement is not correct even using the development version of `dplyr` and `rlang`/`tidyeval` as of June 30, 2017.
Notice when we re-run the start of example the
data.frame
is altered in an unexpected way (an extra column named “my_var
” is added) and the data is grouped by the column “my_var
“, and not by the column “homeworld
” as in the earlier non-pronoun example (which presumably this example was supposed to match). This will be an issue if one tries to use or join this data after a `summarize()` step, as only named variables and the grouping variable survive `summarize()` (so the “homeworld
” will not be present for downstream code expecting to use it).