Consider the problem of “parametric programming” in R. That is: simply writing correct code before knowing some details, such as the names of the columns your procedure will have to be applied to in the future. Our latest version of `replyr::let`

makes such programming easier.

Archie’s Mechanics #2 (1954) copyright Archie Publications

(edit: great news! CRAN just accepted our `replyr 0.2.0`

fix release!)

Please read on for examples comparing standard notations and `replyr::let`

.

Suppose, for example, your task was to and build a new advisory column that tells you which values in a column of a `data.frame`

are missing or `NA`

. We will illustrate this in R using the example data given below:

```
d <- data.frame(x = c(1, NA))
print(d)
# x
# 1 1
# 2 NA
```

Performing an ad hoc analysis is trivial in `R`

: we would just directly write:

`d$x_isNA <- is.na(d$x)`

We used the fact that we are looking at the data interactively to note the only column is “`x`

”, and then picked “`x_isNA`

” as our result name. If we want to use `dplyr`

the notation remains straightforward:

```
library("dplyr")
#
# Attaching package: 'dplyr'
# The following objects are masked from 'package:stats':
#
# filter, lag
# The following objects are masked from 'package:base':
#
# intersect, setdiff, setequal, union
d %>% mutate(x_isNA = is.na(x))
# x x_isNA
# 1 1 FALSE
# 2 NA TRUE
```

Now suppose, as is common in actual data science and data wrangling work, we are not the ones picking the column names. Instead suppose we are trying to produce reusable code to perform this task again and again on many data sets. In that case we would then expect the column names to be given to us as values inside other variables (i.e., as parameters).

```
cname <- "x" # column we are examining
rname <- paste(cname, "isNA", sep= '_') # where to land results
print(rname)
# [1] "x_isNA"
```

And writing the matching code is again trivial:

`d[[rname]] <- is.na(d[[cname]])`

We are now programming at a slightly higher level, or automating tasks. We don’t need to type in new code each time a new data set with a different column name comes in. It is now easy to write a `for-loop`

or `lapply`

over a list of columns to analyze many columns in a single data set. It is an absolute travesty when something that is purely virtual (such as formulas and data) can not be automated over. So the slightly clunkier “`[[]]`

” notation (which can be automated) is a necessary complement to the more convenient “`$`

” notation (which is too specific to be easily automated over).

Using `dplyr`

directly (when you know all the names) is deliberately straightforward, but programming over `dplyr`

can become a challenge.

## Standard practice

The standard parametric `dplyr`

practice is to use `dplyr::mutate_`

(the standard evaluation or parametric variation of `dplyr::mutate`

). Unfortunately the notation in using such an “underbar form” is currently cumbersome.

You have the choice building up your formula through variations of one of:

- A formula
- Using
`quote()`

- A string

(source: dplyr Non-standard evaluation, for additional theory and upcoming official solutions please see here).

Let us try a few of these to try and emphasize we are proposing a new solution, not because we do not know of the current solutions, but instead because we are familiar with the current solutions.

### Formula interface

Formula interface is a nice option as it is `R`

’s common way for holding names unevaluated. The code looks like the following (edit: but does not work for `dplyr ‘0.5.0.9000’`

):

```
d %>% mutate_(RCOL = lazyeval::interp(~ is.na(cname))) %>%
rename_(.dots = stats::setNames('RCOL', rname))
# x x_isNA
# 1 1 FALSE
# 2 NA FALSE
```

(edit: looks like the following actually works

d %>% mutate_(RCOL = lazyeval::interp(~ is.na(VAR), VAR=as.name(cname))) %>% rename_(.dots = stats::setNames('RCOL', rname))

)

Currently `mutate_`

does not take “two-sided formulas” so we need to control names outside of the formula. In this case we used the explicit `dplyr::rename_`

because attempting to name the assignment in-line does not seem to be supported (or if it is supported, it uses a different notation or convention than the one we have just seen, edit: also not working for `dplyr ‘0.5.0.9000’`

):

```
# the following does not correctly name the result column
d %>% mutate_(.dots = stats::setNames(lazyeval::interp( ~ is.na(cname)),
rname))
# x is.na(cname)
# 1 1 FALSE
# 2 NA FALSE
```

### Trying `quote()`

`quote()`

can delay evaluation, but isn’t the right tool for parameterizing (what the linked NSE reference called “mixing constants and variable”). We have a hard time getting control of incoming and outgoing variables.

```
# dplyr mutate_ quote non-solution (hard coded x, failed to name result)
d %>% mutate_(.dots =
stats::setNames(quote(is.na(x)),
rname))
# x is.na(x)
# 1 1 FALSE
# 2 NA TRUE
```

My point is: even if this is something that *you* know how to accomplish, this is evidence we are really trying to swim upstream with this notation.

### String solutions

String based solutions can involve using `paste`

to get parameter values into the strings. Here is an example:

```
# dplyr mutate_ paste stats::setNames solution
d %>% mutate_(.dots =
stats::setNames(paste0('is.na(', cname, ')'),
rname))
# x x_isNA
# 1 1 FALSE
# 2 NA TRUE
```

Or just using strings as an interface to control `lazyeval::interp`

:

```
# dplyr mutate_ lazyeval::interp solution
d %>% mutate_(RCOL =
lazyeval::interp("is.na(cname)",
cname = as.name(cname))) %>%
rename_(.dots = setNames('RCOL', rname))
# x x_isNA
# 1 1 FALSE
# 2 NA TRUE
```

Thanks for posting, looks like some interesting ways to handle non standard evaluation in R, which can be a massive pain.

I realise that replyr is a much more diverse tool, and the example using NAs is just an example, but there are functions in the naniar package to help with this exact problem of adding NA columns – see example here: http://www.njtierney.com/naniar/reference/bind_shadow.html.

Thanks for all your work on open source!

LikeLike

Thanks! And yes NA locations is in fact an interesting question- one of the more common tasks that you want to have tools for. Thanks for the naniar reference, I’ll check it out.

LikeLike

This is great, thank you.

LikeLike

Thanks for pointing out what a monstrosity dplyr standard evaluation is

LikeLike

I’d say standard evaluation isn’t a priority in a lot of R packages.

LikeLike

“Currently mutate_ does not take “two-sided formulas” so we need to control names outside of the formula”

The solution is the following:

d %>% mutate_(.dots = setNames(list(~ (is.na(cname))), rname))

LikeLike

S.K.

Sorry to have misled you. But looking at it now none of my formula examples work (notice they are all returning FALSE FALSE). Your code is better about renaming but seems to pick up the same calculation bug.

I’ve edited the above article to mention the failing. Honestly I am not sure what variation of the formula code can work conveniently with dots.

John

LikeLike

I finally got substitute and other non-replyr examples to work and cleaned up: https://github.com/WinVector/replyr/blob/master/vignettes/ParametricExample.Rmd . Corrects issues in the above post.

LikeLike