From dplyr
issue 2916.
The following appears to work.
suppressPackageStartupMessages(library("dplyr"))
COL <- "homeworld"
starwars %>%
group_by(.data[[COL]]) %>%
head(n=1)
## # A tibble: 1 x 14
## # Groups: COL [1]
## name height mass hair_color skin_color eye_color birth_year
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
## 1 Luke Skywalker 172 77 blond fair blue 19
## # ... with 7 more variables: gender <chr>, homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>, COL <chr>
Though notice it reports the grouping is by "COL
", not by "homeworld
". Also the data set now has 14
columns, not the original 13
from the starwars
data set.
And this seemingly similar variation (currently) throws an exception:
homeworld <- "homeworld"
starwars %>%
group_by(.data[[homeworld]]) %>%
head(n=1)
## Error in mutate_impl(.data, dots): Evaluation error: Must subset with a string.
I know this will cost me what little community good-will I might have left (after already having raised this, unsolicited, many times), but please consider using our package wrapr::let()
for tasks such as the above.
library("wrapr")
let(
c(COL = "homeworld"),
starwars %>%
group_by(COL) %>%
head(n=1)
)
## # A tibble: 1 x 13
## # Groups: homeworld [1]
## name height mass hair_color skin_color eye_color birth_year
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
## 1 Luke Skywalker 172 77 blond fair blue 19
## # ... with 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
let(
c(homeworld = "homeworld"),
starwars %>%
group_by(homeworld) %>%
head(n=1)
)
## # A tibble: 1 x 13
## # Groups: homeworld [1]
## name height mass hair_color skin_color eye_color birth_year
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
## 1 Luke Skywalker 172 77 blond fair blue 19
## # ... with 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>
Some explanation can be found here.
Categories: Opinion
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Some variations in notation:
The notations being most commonly taught by the package authors are:
verb(!!columnvar)
, verb(.data[[columnvar]]),verb(.data$columnvar)
(also here) andverb(.data[columnvar])
. None of these are the working notationverb(.data[[!!columnvar]])
.Note it is possible to remember the form that works (though it isn’t the one being commonly taught). First I already teach to always use
[[]]
where possible as it stricter than[]
. So lets assume we always remember to do that. Then just remember that you should always have a!!
inrlang
/tidyeval
situations.The example comes from here which also suggests using the
quo()
notation (in addition to using string notation). That does appear to work:However that does not address the application I am actually interested in: wrapping a column name that comes from an external string (perhaps even coming from an external configuration file). In my applications I not only manipulate column names as strings, they ofter are first available in that form. I feel if you are close enough to write “`quo(eye_color)`” you are likely close enough to re-code the pipeline directly.
To get the
quo()
-like notation to work in that case I assume you must do something like the following:I do think the wrapr::let is clearer and easier to understand than dplyr approach. I have already used it in production code.
Any performance test?
Thanks! I really appreciate it. Our group will make sure
wrapr::let()
remains stable and production worthy.As for timings. Keeping in mind: both substitution systems should take very little time compared to any substantial calculation task, so should not be that important. But, it is a fun question. So I worked up a quick report here.
The report compares 3 substitution methods:
wrapr::let()
(labeled as `fWrapr*`), `rlang::eval_tidy()` working from a name holding a string (closest towrapr::let()
in behavior, labeled as `fTidyN*`), and `rlang::eval_tidy()` working from a `quo()` symbol (the case the `rlang`/`tidyeval` package authors seem to discuss the most, labeled as `fTidyQ*`). I plotted the timing distributions a few ways and draw some conclusions while substituting 1 to 10 variables. Below is one of the plots (more context is given in the report):Thank you for the benchmark.
Recently, I have been doing many data manipulations using a combination of dpyr + purrr + wrapr. When the project is finished, I think I could have some experience to discuss.
Keep up the great work
Hi John – I have also started using wrapr::let in analysis code. It really helps to lift the mental burden of constantly thinking about quoting, unquoting, strings vs. symbols, and other metaprogramming issues which are largely peripheral to getting things done.
I’ve really tried to do it the idiomatic dplyr way out of respect for Mr. Wickham’s amazing work… but, as I’ve said before, I don’t like constantly thinking about metaprogramming. Things like quasiquotes may be “powerful” but I don’t want to be worrying about them with every line I write. If I did… I’d probably be a LISP programmer :)