Menu Home

Another R [Non-]Standard Evaluation Idea

Jonathan Carroll had a an interesting R language idea: to use @-notation to request value substitution in a non-standard evaluation environment (inspired by msyql User-Defined Variables).

He even picked the right image:

PandorasBox

The idea is kind of reverse from some Lisp ideas ("evaled unless ticked"), but an interesting possibility. We can play along with it a bit in R as follows. The disadvantages of our simulation include:

  • The user must both call wrapr::ateval and place their code in quotes.
  • The effect is still achieved through string substitution.

But here it is for what it is worth:

# devtools::install_github("WinVector/wrapr")
library("wrapr")
library("dplyr")

The original example function from the Tweet:

f <- function(col1, col2, new_col_name) {
  ateval('mtcars %>% mutate(@new_col_name = @col1 + @col2)')
}

And the requested effect actually realized:

d <- f('gear', 'carb', 'nonsense')
head(d)
##    mpg cyl disp  hp drat    wt  qsec vs am gear carb nonsense
## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4        8
## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4        8
## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1        5
## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1        4
## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2        5
## 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1        4

The point is a standard function (f()) can accept user column names and then itself use convenient non-standard evaluation notation (in this case dplyr) to perform operations. This package-based simulation is not as good as actual language support, but simulations like this are how we collect experience with possible new language features.

The real point is a user wants a flexible language with macros and functions (R uses an intermediate form called "an Fexpr" for just about everything) that they can both use interactively and program over. This means they eventually want an execution environment where they can both pass in parametric values (what R calls standard evaluation) and the ability to have code elements treated directly as values (a convenience and related to what R calls non-standard evaluation).

The classic Lisp solution organizes things a bit differently, and uses various "back-tick" notations to specify control of the interpretation of symbols. I think R has picked a different set of defaults as to how symbols, values, expressions, and execution interact- so any notation is going to be a bit different.

The development version of wrapr can be found here (atexpr is not yet in the CRAN version, which supplies the let alternative). The example shown in this article can be found in markdown form here.

Categories: Coding Opinion

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

1 reply

  1. We can also try a (!!name) notation inspired by recent developments in dplyr and hadley/rlang

    fb <- function(col1, col2) {
      beval(
        mtcars %>% mutate(res_col = (!!col1) + (!!col2))
      )
    }
    head(fb('gear', 'carb'))
    ##    mpg cyl disp  hp drat    wt  qsec vs am gear carb res_col
    ## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4       8
    ## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4       8
    ## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1       5
    ## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1       4
    ## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2       5
    ## 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1       4

    Notice we no longer need to quote the source code. However, without some additional trick the (!!name) notation can not be used on the left hand side of an assignment. Also note: !! is not a no-op, but is a sufficiently uncommon expression I thought we could use it (some have mentioned using !!! as it is equivalent to ! and therefore not strictly necessary).

    It is kind of cool that R‘s C-style “not operator” obeys a law of intuitionist logic (triple negation is equivalent to negation) and not the laws of classical logic (double negation being equivalent to a no-op).

    Note that for dplyr arguments more powerful substitutions are going to be supplied by hadley/rlang. So ateval() and beval() are definitely going to be made obsolete at some point. They let you experiment with notation until dplyr‘s next major release supplies the new SE/NSE adapters !! and UQ().

    In fact Lionel Henry (and others) are “currently porting the tidyverse to the tidy evaluation framework” (which I take to be hadley/rlang). This will add a lot of notational power to dplyr. However this is a major refactoring of a lot of code onto a fork of lazyeval that itself is still fairly new.

    We have been publicly distributing the let() method (in one form or another) since early December 2016 (and I don’t think dplyr deprecation of “underbar forms” was public on the dplyr master branch until mid February 2017), so I feel there is definitely a need for and room for more than one standard evaluation adapter solution.

    For critical notation you really want a number of specifications and implementations for the user community to test and choose from. In open source you also want significant cross-pollination so the community gets the sum of possible improvements. For example Jonathan Carroll has expanded his observation into a request for comments on R-dev.

%d bloggers like this: