# R Tip: Think in Terms of Values

`R` tip: first organize your tasks in terms of data, values, and desired transformation of values, not initially in terms of concrete functions or code.

I know I write a lot about coding in `R`. But it is in the service of supporting statistics, analysis, predictive analytics, and data science.

`R` without data is like going to the theater to watch the curtain go up and down.

(Adapted from Ben Katchor’s Julius Knipl, Real Estate Photographer: Stories, Little, Brown, and Company, 1996, page 72, “Excursionist Drama 2”.)

Usually you come to `R` to work with data. If you think and plan in terms of data and values (including introducing more data to control processing) you will usually work in much faster, explainable, and maintainable fashion.

## A simple example

Let’s start with a typical `dplyr` example. Suppose we wish to select two columns (in this case `c("name", "height")`) from a `data.frame` (in this case `dplyr::starwars`). This is accomplished easily as we show below.

```library("dplyr")

starwars %>%
select(name, height)
# # A tibble: 87 x 2
#    name               height
#    <chr>               <int>
#  1 Luke Skywalker        172
#  2 C-3PO                 167
#  3 R2-D2                  96
#  5 Leia Organa           150
#  6 Owen Lars             178
#  7 Beru Whitesun lars    165
#  8 R5-D4                  97
#  9 Biggs Darklighter     183
# 10 Obi-Wan Kenobi        182
# # ... with 77 more rows
```

In practice we recommend coding only after you have decided on what you are going to do, and what parameters specify what your steps.

Once you get to coding, in our opinion intent is much clearer if you organize your make things explicit. For example, if you are working with `magrittr` pipes: make the pipe input argument explicit with “.” (please see R Tip: Make Arguments Explicit in magrittr/dplyr Pipelines). And if you are workign with `dplyr::select()`: make the argument roles explicit. We suggest collecting the column names into a separate group to show their role is different than the role of the incoming `data.frame`. At first this explicitness unfortunately reduces legibility as our code then looks like the following.

```starwars %>%
select(., one_of( c("name", "height") ))
```

Note this is not a criticism of `one_of()`, it is a discomfort of needing something like `one_of()`. And I fully admit: the popular `dplyr` style of not including the first argument in pipelines does not have the legibility problem; I myself introduced that problem by insisting on an explicit data argument. However, I have found that explicit arguments make it much easier for students to learn how to use `dplyr` functions simultaneously inside and outside pipelines. I also feel the explicit documentation of arguments has a number of down-stream advantages.

Minimize your reliance on implicit convention. What is obvious to you when writing the code may not be obvious to others, and may be something you don’t remember later. Along these lines we have a mini-style guide for effectively using `dplyr` with and without pipelines here.

Our specific legibility issue is just a matter of the nested “`one_of(c("", ...))`” construct being a bit clumsy. If we use an adapted version of `select()` that expects the list of columns to come in as a vector (as is typical for values in `R`) and use a vector constructor that does not need the quotes (such as `qc()`, please see R Tip: Use qc() For Fast Legible Quoting) we get a pipeline that is both very explicit (so more self-documenting) and quite convenient and legible:

```library("wrapr")
library("seplyr")

starwars %>%
select_se(., qc(name, height))
```

`select_se()` stands for “select standard evaluation”, meaning it is an adaption of `select()` that expects to be supplied the set of columns as a vector value. This function has a two-argument interface (data and vector of columns) and is simple to describe and reason about. `qc()` itself is a non-standard (or name capturing) interface. This is all `qc()` does, so it documents the user’s intent to capture names. If one does not mind the quotes one can avoid `qc()` entirely and write code such as the following.

```columns <- c("name", "height")
select_se(starwars, columns)
```

The above is simple, as it should be. `select_se()` is a function that expects two values and we call it supplying two values. This may seem less magical than “`starwars %> select(name, height)`” (which involves piping, hidden function arguments, and name capture), and if so that is a good thing. Selecting a few columns is a basic task, so it should require a lot of cognitive load.

Even better than more variations on tool interfaces, is more tools to capture values that can be used and re-used many ways later.

## Value capturing tools

Our group has been developing some simple tools for conveniently capturing values from the user. The idea is with these you get most of the convenience of having non-standard interfaces in many places, without the additional complexity of depending on non-standard interfaces being everywhere.

The trouble with nonstandard evaluation is that it doesn’t follow standard evaluation rules …

—Peter Dalgaard (about nonstandard evaluation in the curve() function) R-help (June 2011)

Standard evaluation interfaces (or value oriented interfaces) are generally preferred because their primary property is referential transparency. Referential transparency is when expressions can be replaced by their evaluated values without changing outcomes. Sequentially replacing expressions with values is program evaluation.

But, away from theory in the large and back to programming in the small. Lets conclude with a few tool that make constructing useful values easier.

We have already seen `qc()` is “quoting concatenate”, which we have already demonstrated. It is used as follows.

```v <- qc(name, height)

print(v)
# [1] "name"   "height"

dput(v)
# c("name", "height")
```

`qc()` can also be used to construct named vectors, which are very useful as maps.

```map <- qc(a = A, b = B)

print(map)
#   a   b
# "A" "B"
```

We also have a “print as paste-able code” function `map_to_char()`, which is a bit more convenient (for simple structures) than `dput()`.

```dput(map)
# structure(c("A", "B"), .Names = c("a", "b"))

map_to_char(map)
# [1] "c('a' = 'A', 'b' = 'B')"
```

We also have `build_frame()`, which is a convenience for typing in simple small `data.frame`s directly in row-oriented form (similar in intent to `tibble:tribble()`):

```d <- build_frame(
"name", "value" |
"a"   , 1       |
"b"   , 2       )

print(d)
#   name value
# 1    a     1
# 2    b     2
```

The end of the first row is indicated by most any infix operator (we used “`|`“). More details on working with `build_frame()` can be found here.

The `draw_frame()` function can render small simple `data.frame`s into paste-able form. This is a great way to capture and sure examples (without dates or other complex or annotated types).

```cat(draw_frame(d))
# build_frame(
#    "name", "value" |
#    "a"   , 1       |
#    "b"   , 2       )

dput(d)
# structure(list(name = c("a", "b"), value = c(1, 2)), .Names = c("name",
# "value"), row.names = c(NA, -2L), class = "data.frame")
```

Strip off the comment `#`-marks and you can paste the `draw_frame()` presentation into other work as legible code.

For `data.frame`s that are purely string valued, we have `qchar_frame()`, which is essentially `qc()` for `data.frame`s.

```d <-  qchar_frame(
name, value |
a   , x     |
b   , y     )

print(d)
#   name value
# 1    a     x
# 2    b     y

cat(draw_frame(d))
# build_frame(
#    "name", "value" |
#    "a"   , "x"     |
#    "b"   , "y"     )
```

The `cdata` package uses pure-`character` `data.frame`s for pivot/un-pivot control structures, and thus can make good use of `qchar_frame()`.

## Conclusion

In conclusion: sometimes when you think you need more code, you actually just need to move more of your intent into data and values. In `R` it pays to treat as much as you can as values (data, selections, configuration, and even results).

Categories: Coding Opinion Pragmatic Data Science Tutorials

### jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.