A bit more on the let
wrapper from our replyr R package.
library("replyr") help(let, package="replyr")
(Edit: this has been updated to the `0.2.0` version of `replyr` which eliminates some of the `()` notation).
let {replyr} | R Documentation |
Execute expr with name substitutions specified in alias.
Description
let
implements a mapping from desired names (names used directly in the expr code) to names used in the data. Mnemonic: "expr code symbols are on the left, external data and function argument names are on the right."
Usage
let(alias, expr)
Arguments
alias |
mapping from free names in expr to target names to use. |
expr |
block to prepare for execution |
Details
Code adapted from gtools::strmacro
by Gregory R. Warnes (License: GPL-2, this portion also available GPL-2 to respect gtools license). Please see the replyr
vignette
for some discussion of let and crossing function call boundaries: vignette('replyr','replyr')
. Transformation is performed by substitution on the expression parse tree, so be wary of name collisions or aliasing.
Something like let
is only useful to get control of a function that is parameterized (in the sense it take column names) but non-standard (in that it takes column names from non-standard evaluation argument name capture, and not as simple variables or parameters). So replyr:let
is not useful for non-parameterized functions (functions that work only over values such as base::sum
), and not useful for functions take parameters in straightforward way (such as base::merge
‘s "by
" argument). dplyr::mutate
is an example where
we can use a let
helper. dplyr::mutate
is parameterized (in the sense it can work over user supplied columns and expressions), but column names are captured through non-standard evaluation (and it rapidly becomes unwieldy to use complex formulas with the standard evaluation equivalent dplyr::mutate_
). alias
can not include the symbol ".
".
Value
result of expr executed in calling environment
See Also
Examples
library('dplyr') d <- data.frame(Sepal_Length=c(5.8,5.7), Sepal_Width=c(4.0,4.4), Species='setosa', rank=c(1,2)) mapping = list(RankColumn='rank',GroupColumn='Species') let(alias=mapping, expr={ # Notice code here can be written in terms of # known or concrete names "RankColumn" and # "GroupColumn", but executes as if we # had written mapping specified columns # "rank" and "Species". # restart ranks at zero. d %>% mutate(RankColumn=RankColumn-1) -> dres # confirm set of groups. unique(d$GroupColumn) -> groups }) print(groups) print(length(groups)) print(dres) # It is also possible to pipe into let-blocks, but it takes some extra # notation (notice the extra ". %>%" at the beginning and the extra # "()" at the end, to signal %>% to treat the let-block as a # function to evaluate). d %>% let(alias=mapping, expr={ . %>% mutate(RankColumn=RankColumn-1) })() # Or: d %>% letp(alias=mapping, expr={ . %>% mutate(RankColumn=RankColumn-1) }) # Or: f <- let(mapping, . %>% mutate(RankColumn=RankColumn-1) ) d %>% f # Be wary of using any assignment to attempt # side-effects in these "delayed pipelines", as # the assignment tends to happen during the # let dereference and not (as one would hope) during # the later pipeline application. Example: g <- let(alias=mapping, expr={ . %>% mutate(RankColumn=RankColumn-1) -> ZZZ }) print(ZZZ) # Notice ZZZ has captured a copy of the sub-pipeline # and not waited for application of g. Applying g # performs a calculation, but does not overwrite ZZZ. g(d) print(ZZZ) # Notice ZZZ is not a copy of g(d), but instead # still the pipeline fragment. # let works by string substitution aligning on # word boundaries, so it does (unfortunately) also # re-write strings. let(list(x='y'),'x')
Categories: Coding
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Really `_` functions from dplyr (like _mutate, _filter/) aren’t enough and we need whole new package?
What `()` and `()()` are for after `let` call in the examples : < it looks hilarious?
(Edit: this comment and `replyr` were both revised after Marcin Kosiński (and others’) feedback to work with `replyr` version 0.2.0 which eliminates one of the `()` notations that many people had a problem with. I used to think I had a good reason for it, but instead with less wrapping I am happy to see it gone. I need to practice being thankful for feedback and learn as much from it as fast as practical.)
Having to use
mutate_
a bit for various projects, I’ve foundmutate_
to not be at all convenient. The requiredstats::setNames
orlazyeval::interp
forms are hard to read (let alone write or remember). As they stand right now, I think they are not enough. Some changes are coming (please see here for an April 2016 note onlazyeval
capabilities, but also note the comment at the end: “Currently neither ggplot2 nor dplyr actually use these tools since I’ve only just figured it out. But I’ll be working hard to make sure all my packages are consistent in the near future.”).If
replyr::let
isn’t to your tastes, then it isn’t something you should try. As far as needing one more package,replyr
adds some useful functionality (let
,gapply
, and other functions) and brings in a moderate number of dependencies.If you are interested, consider the simple problem of trying to create a column which indicates which rows of another column are
NA
when both the column to be tested and where to land the result are not known until later (i.e. we have to take the column names from variables).Of the three solutions, I dislike my own
replyr::let
solution the least.I’ve expanded the above into a vignette.
Experimenting with operator notation versions of let (in Github version of package):