Menu Home

R Tip: Make Arguments Explicit in magrittr/dplyr Pipelines

I think this is the R Tip that is going to be the most controversial yet. Its potential pitfalls include: it is a style prescription (which makes it different than and less immediately useful than something of the nature of R Tip: Force Named Arguments), and it is heterodox (this is not how magrittr/dplyr is taught by the original authors, and not how it is commonly used). However, I have not been at all good at anticipating which tips get which sort of reception (and this valuable feedback, public and private, is part of what I get of this series).

On to the tip (which only applies if you are a magrittr pipeline user).

R tip: when using magrittr pipelines consider making them more explicit, and more readable (especially to novices) by using explicit dot-arguments throughout.

The advice is: write pipelines that look like the this:

suppressPackageStartupMessages(library("dplyr"))

starwars %>%
  filter(., height > 200) %>%
  select(., height, mass) %>%
  head(.)

And avoid overly concise pipelines such as the this:

starwars %>%
  filter(height > 200) %>%
  select(height, mass) %>%
  head

The guidance is: each step in a simple magrittr pipeline is a function call that has at least one of its arguments directly written as “.“. Example: “atan2(3, .)” is a simple step, but neither “atan” nor “atan2(abs(.), 5)” is a simple step.

The intended point is: the first pipeline is more explicit and regular. This makes it easier to explain and easier for newcomers to read. For pipelines limited to this style: approximately each step is run in sequence as if the value of the last step were in a variable named “.“.

Note: the exact magrittr semantics are in fact more detailed that what I just said. The idea is to start newcomers in a sub-dialect of magrittr that has a simpler correct mental model before (or if ever) moving to the full details. The full details are perhaps more than a part time R user should be expected to remember. It is a bit much to expect a non-cognoscenti always remember that “5 %>% atan2(3, .)” is completely different than “5 %>% atan2(3, abs(.))“, and that “5 %>% {. + 1}” is completely different than “5 %>% (. + 1)“.

Categories: Coding Tutorials

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

1 reply

  1. Another reason to use explicit arguments: without argument markers (or at least the parenthesis) magrittr fails with package qualified function names.

    library("dplyr")
    
    db <- DBI::dbConnect(RSQLite::SQLite(),
                         ":memory:")
    d <- dplyr::copy_to(db, data.frame(x = 1))
    
    d %>% tally(.)
    #> # Source:   lazy query [?? x 1]
    #> # Database: sqlite 3.19.3 [:memory:]
    #>       n
    #>   <int>
    #> 1     1
    
    d %>% tally
    #> # Source:   lazy query [?? x 1]
    #> # Database: sqlite 3.19.3 [:memory:]
    #>       n
    #>   <int>
    #> 1     1
    
    d %>% dplyr::tally(.)
    #> # Source:   lazy query [?? x 1]
    #> # Database: sqlite 3.19.3 [:memory:]
    #>       n
    #>   <int>
    #> 1     1
    
    d %>% dplyr::tally
    #> Error in .::dplyr: unused argument (tally)
    
%d bloggers like this: