Another R tip. Need to replace a name in some R code or make R code re-usable? Use
Here is an example involving
Let’s look at some example data:
library("dplyr") library("wrapr") starwars %>% select(., name, homeworld, species) %>% head(.) # # A tibble: 6 x 3 # name homeworld species # <chr> <chr> <chr> # 1 Luke Skywalker Tatooine Human # 2 C-3PO Tatooine Droid # 3 R2-D2 Naboo Droid # 4 Darth Vader Tatooine Human # 5 Leia Organa Alderaan Human # 6 Owen Lars Tatooine Human
.” please see R Tip: Make Arguments Explicit in
dplyr Pipelines. Also, though we will not use it here, we feel separating argument types (data versus columns) in
select() is much more comprehensible and made easy using
qc() notation such as “
select(., qc(name, homeworld, species))“.
Now let’s change the name of one column. The challenge will be: the name of the old column and the new name will not be known at the time of writing the code (a common problem when writing re-usable functions or code).
Suppose the remapping is specified in variables, as below.
newname <- "genus" oldname <- "species"
We could prepare to work with column names as values using
wrapr::let() as we show here.
let( alias = c(NEWNAME = newname, OLDNAME = oldname), starwars %>% rename(., NEWNAME = OLDNAME) %>% select(., name, homeworld, NEWNAME) %>% head(.) ) # name homeworld genus # <chr> <chr> <chr> # 1 Luke Skywalker Tatooine Human # 2 C-3PO Tatooine Droid # 3 R2-D2 Naboo Droid # 4 Darth Vader Tatooine Human # 5 Leia Organa Alderaan Human # 6 Owen Lars Tatooine Human
The merit of the above notation is the exact new names
"genus" may come from variables, and do not need to be known to the programmer writing the
let()-block. There are other methods to attempt such substitution (which were actually publicly pre-announced only after
let() had already been publicly announced and in
CRAN distribution; so
let() is in fact known prior art despite apparently not being cited). In our experience (and opinion)
wrapr::let() is by far the most legible, teachable, and reliable code-rewriting (or meta-programming) tool for this task in R. It is a good choice for part time R users and we are working on formal documentation for expert users.
Another alternative is to use the
seplyr package, which wraps
dplyr operators into more standard value oriented notation. The above example in
seplyr is as follows.
library("seplyr") starwars %>% rename_se(., newname := oldname) %>% select_se(., c("name", "homeworld", newname)) %>% head(.)
:=, please see here.)
Let’s finish with an example from the dplyr 0.7.0 announcement. The following is code from that announcement:
my_var <- "homeworld" starwars %>% group_by(.data[[my_var]]) %>% summarise_at(vars(height:mass), mean, na.rm = TRUE) # # A tibble: 49 x 3 # my_var height mass # <chr> <dbl> <dbl> # 1 Alderaan 176. 64.0 # 2 Aleen Minor 79.0 15.0 #...
Notice the grouping column is incorrectly named as “
my_var” (some other places this was noticed: 1, 2, 3). This is not harmless, as code attempting to refer to the original name will fail. The above is possibly not the current preferred rlang notation, which has been iterating through “
!!” and “
UQ()” (though I think
UQ() is already “soft deprecated”). My theory is the correct form may be the even more cumbersome “
.data[[!!my_var]]” even though this is not being commonly taught. However, even if the original code is indeed “malformed rlang/dplyr” (that is: outside the intended variations of the grammar), notice: that it was not caught or signaled. And at least at some point recently the shorter notation was being taught by the package authors. So it is hard to consider the rlang notation and teaching quite settled.
let() notation is easy and works correctly.
let( c(MY_VAR = my_var), starwars %>% group_by(MY_VAR) %>% summarise_at(vars(height:mass), mean, na.rm = TRUE) ) # # A tibble: 49 x 3 # homeworld height mass # <chr> <dbl> <dbl> # 1 Alderaan 176. 64.0 # 2 Aleen Minor 79.0 15.0
seplyr equivalent is the following:
starwars %>% group_by_se(., my_var) %>% summarise_at(vars(height:mass), mean, na.rm = TRUE)
If you absolutely must have “data pronouns” (such as the “
.data” notation), those are actually fairly easy to add to classic base-R pipe enhanced functions. Though we feel most R users avoid need of such pronouns through proper use of common R structured environment nesting conventions (just as many programmers do not feel the need for a “goto” statement when they stick to structured coding conventions).
Categories: Coding Opinion Statistics Tutorials
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Just to confirm we are using current (as of 2018-03-26) CRAN versions of the relevant packages: