While going over some of the discussion related to my last post I came up with a really neat way to use
Please read on to see the situation and example.Suppose we want to parameterize over a couple of names, one denoting a variable coming from the current environment and one denoting a column name. Further suppose we are worried the two names may be the same.
We can actually handle this quite neatly, using
tidyeval to denote intent (in this case using “
!!” to specify “take from environment instead of the data frame”) and allowing
wrapr::let() to perform the substitutions.
suppressPackageStartupMessages(library("dplyr")) library("wrapr") mass_col_name = 'mass' mass_const_name = 'mass' mass % transmute(height, (!! MASS_CONST), # `mass` from environment MASS_COL, # `mass` from data.frame h100 = height * (!! MASS_CONST), # env hm = height * MASS_COL # data ) %>% head() ) #> # A tibble: 6 x 5 #> height `(100)` mass h100 hm #> #> 1 172 100 77 17200 13244 #> 2 167 100 75 16700 12525 #> 3 96 100 32 9600 3072 #> 4 202 100 136 20200 27472 #> 5 150 100 49 15000 7350 #> 6 178 100 120 17800 21360
All in all, that is pretty neat.
tidyeval uses “
(!! )” deference notation in a number of ways, here we are only using it to specify environment, not for substitution.)
Categories: Opinion Programming Statistics
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
wrapr::let()features (found in the development version) include
debugPrint=TRUEmodes. With these options you can see what would be executed or what is being executed. These are great for learning
For example in our above example we could run:
This results in:
which is exactly the code re-written by
wrapr::let()has prepared for execution (one can even pass it to
eval()for execution). This is an excellent way to see what
wrapr::let()does, and work out if it does what you want. The “
(!(!mass))” is just how
(!! mass), and as you see executes the same.
wrapr::let()both prints the replaced expression and then executes as usual.
If you want to be very strict (and completely unambiguous) you can use the
.data$pronoun form to force references to the
data.frame. We show this below.
We do not currently recommend using the pronoun in the form
.data[[my_var]]. If you use `rlang`/`tidyeval` to perform substitutions *always* write something such as
.data[[!!my_var]](some details here). This is due to complications described in `dplyr` issues 2904 and 2916.
This is one of the reasons we advise using `wrapr::let()` for substitution, even if you are using `rlang`/`tidyeval` (hence why you might end up using them together).
The `rlang`/`tidyeval` substitution issues can be subtle and are possibly why the data-pronoun example in the actual `June 13, 2017 dplyr 0.7.0` announcement is not correct even using the development version of `dplyr` and `rlang`/`tidyeval` as of June 30, 2017.
Notice when we re-run the start of example the
data.frameis altered in an unexpected way (an extra column named “
my_var” is added) and the data is grouped by the column “
my_var“, and not by the column “
homeworld” as in the earlier non-pronoun example (which presumably this example was supposed to match). This will be an issue if one tries to use or join this data after a `summarize()` step, as only named variables and the grouping variable survive `summarize()` (so the “
homeworld” will not be present for downstream code expecting to use it).