One thing that is sure to get lost in my long note on macros in R
is just how concise and powerful macros are. The problem is macros are concise, but they do a lot for you. So you get bogged down when you explain the joke.
Let’s try to be concise.
Below is an extension of an example taken from the Programming with dplyr
note.
First let’s load the package and define our symbols that hold names of columns we wish to work with later.
suppressPackageStartupMessages(library("dplyr"))
group_nm <- as.name("am")
num_nm <- as.name("hp")
den_nm <- as.name("cyl")
derived_nm <- as.name(paste0(num_nm, "_per_", den_nm))
mean_nm <- as.name(paste0("mean_", derived_nm))
count_nm <- as.name("count")
Now let’s use rlang
to substitute those symbols into a non-trivial dplyr
pipeline.
mtcars %>%
group_by(!!group_nm) %>%
mutate(!!derived_nm := !!num_nm/!!den_nm) %>%
summarize(
!!mean_nm := mean(!!derived_nm),
!!count_nm := n()
) %>%
ungroup() %>%
arrange(!!group_nm)
## # A tibble: 2 x 3
## am mean_hp_per_cyl count
## <dbl> <dbl> <int>
## 1 0 22.7 19
## 2 1 23.4 13
The above is very useful, we have just gotten programmatic control of all the symbol names in a pipeline. This is what is needed to wrap such a pipeline in a function and make it parametric re-usable.
The thing is Thomas Lumley’s base::bquote()
could achieve this in 2003, and Gregory R. Warnes’ gtools::strmacro()
could further automate specifying the automation in 2005.
Lets show that. First we use gtools
to build a "bquote()
wrapping factory."
library("gtools")
# build a method wrapping macro
bq_wrap <- strmacro(
FN,
expr = {
FN <- function(.data, ...) {
env = parent.frame()
mc <- substitute(dplyr::FN(.data = .data, ...))
mc <- do.call(bquote, list(mc, where = env), envir = env)
eval(mc, envir = env)
}
}
)
Now we use it to wrap some dplyr
methods (ignoring non ...
options).
# wrap some dplyr methods
bq_wrap(mutate)
bq_wrap(summarize)
bq_wrap(group_by)
bq_wrap(arrange)
At this point we have re-adapted 4 dplyr
methods to use bquote()
quasiquotation. This is what we mean by strmacro()
is a tool to build tools.
And here is the same pipeline again, entirely driven by bquote()
.
mtcars %>%
group_by(.(group_nm)) %>%
mutate(.(derived_nm) := .(num_nm)/.(den_nm)) %>%
summarize(
.(mean_nm) := mean(.(derived_nm)),
.(count_nm) := n()
) %>%
ungroup() %>%
arrange(.(group_nm))
## # A tibble: 2 x 3
## am mean_hp_per_cyl count
## <dbl> <dbl> <int>
## 1 0 22.7 19
## 2 1 23.4 13
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
And, of course, how to perform the same steps using
wrapr::let()
.Much of the difference is
let()
having to work hard to get a deliberate side-effect (assigning new functions into the user’s environment), whereasstrmacro
‘s macro style gives it such access for free.And if you are not messing with left-hand sides of assignments, you can even use
eval
:I generalized the transform code a bit. One can adapt pretty much adapt most of
dplyr
to usebquote
abstraction in about 35 lines of code. I’ll probably promote that observation/example to an article at some point.We now have the example worked in parameterized rquery here.