Menu Home

In praise of syntactic sugar

There has been some talk of adding native pipe notation to R (for example here, here, and here). And even a tidyeval/rlang pipe here.

I think a critical aspect of such an extension would be to treat such a notation as syntactic sugar and not insist such a pipe match magrittr semantics, or worse yet give a platform for authors to insert their own preferred ad-hoc semantics.

A prominent place where pipe-notation is used is in the language F#,, where it is literally defined in F# itself as:

    let (|>) x f = f x

From this simple definition, one versed in the semantics of F# has a chance at inferring the semantics of the pipe.

In the R language, the magrittr package supplies a pipe written as “%>%”. This pipe’s implementation depends on very complicated unevaluated expression capture and direct manipulation of execution environments. Also the magrittr pipe picks its own semantics as it wants, as it does not inherit semantics from a simple definition. For example all of the following are valid magrittr:

library("magrittr")

4 %>% sin
#> [1] -0.7568025

4 %>% sin()
#> [1] -0.7568025

4 %>% sin(.)
#> [1] -0.7568025

These all seem to be convenient choices, but new users have to memorize them as they can not infer these from things they may already know about the R programming language.

Also the freedom of choice in semantics means many arbitrary choices get made (nothing is prior apodictic) and you get some debatable choices such as: “data.frame(x=1) %>% dplyr::bind_rows(list(.,.))” having 3 rows (in contrast to “data.frame(x=1) %>% { dplyr::bind_rows(list(.,.)) }” having 2 rows).

What I would like if a native pipe were to be added to R is: for the pipe to be defined as formally equivalent to some larger R expression. Then we could teach it as such and we would not get any new corner cases or exceptional behaviors. Even if an implementation fails to reach this standard, this way of defining things lets us now how it should have worked (in a perfect world).

A good R pipe operator might have an aspirational definition along the lines of:

  "a %.>% b" is to be treated
  as if the user had written "{ . <- a; b };"
  with "%.>%" being treated as left-associative.
  (there are "." associated side-effects).

Notice we are not saying “b(a)” as that doesn’t deal with directed choice of argument placement, and other cases R-users have come to expect. This also allows piping into non-function expressions (a neat feature).

Surprisingly enough the above actually works. It means a pipeline such as the following:

a <-
  4      %.>%
  sin(.) %.>%
  exp(.) %.>%
  cos(.)

Is unambiguously meant to be a short-hand for the following (ugly) nested code.

a <- { . <- { . <- { . <- 
  4       ; 
  sin(.) };; 
  exp(.) };; 
  cos(.) };

It does not matter that nobody would write the nested code, that is precisely what we are not asking anybody to do. The point is, a student can attempt to check this translation on small examples, and even run both versions. For more on association of R-operators please see here.

Notice in the above ugly nested example that “};;” is starting to look a bit like a piping operator. This is calling out two things:

Roughly if the user is willing to write code such as the following, then they don’t need pipes.

     . <-  4 
     . <-  sin(.)
     . <-  exp(.)
     . <-  cos(.)
a <- .

Or, as we have observed before, some notations start to look like you already have piping capabilities in base-R (arguing one should give base-R a chance before insisting on extensions).

For example we can use “;.<-” as a pipe (the first one I noticed, and my attempt to not use left-arrow that often):

a <-{ 
  4      ->.;.<- 
  sin(.)    ;.<-
  exp(.)    ;.<-
  cos(.)    ;.}

Or we can use “->.;” as a pipe:

a <-{ 
  4        ->.;
  sin(.)   ->.; 
  exp(.)   ->.;
  cos(.)        }

I think this last one is actually pretty if we go all-in with right-assignment:

  4      ->.; 
  sin(.) ->.; 
  exp(.) ->.; 
  cos(.) ->   a

The remaining strong objections to this Bizarro Pipe notation (in my mind) are:

  • ->.;” is an ugly glyph. This is because its representation is its implementation.
  • This pipe isn’t very compatible with left-assignment (R’s preferred assignment) without adding additional blocks.

Roughly: introducing new notation need not be as disruptive as introducing new semantics. Also conventions can have great advantage, even if they do not have language assistance or enforcement (though such things are good).

Categories: Opinion Tutorials

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

8 replies

  1. The other day I had an error I wanted to debug by replacing %>% with bizarro pipe. When I did the replacement and ran the result, instead of R giving me the same error and a variable I could inspect, I got a different error earlier in the pipeline. It seems that bizarro pipe is not entirely equivalent to %>%, perhaps because, as you say, its implementation is so complicated.

    Sorry I don’t have time to be more specific and give a MRE, but it seemed worth sharing given the topic of this post!

    1. You are correct Bizarro Pipe does not fully match magrittr semantics. The biggest differences are the not inserting an argument, non-lazy evaluation and insistences of “.”.

      Some of what you saw may be an observable effect of the lazy evaluation of some of your data items. It can imagine a syntax error late in a magrittr pipe could mask a data error earlier in a pipe.

      But isn’t just you or even me, magrittr has problems even with dplyr.

  2. Oh, what the heck. I am going to call this “dot arrow.”

    # devtools::install_github("WinVector/wrapr")
    library("wrapr")
    
    a <-
      4      %.>%
      sin(.) %.>%
      exp(.) %.>%
      cos(.)
    print(a)
    #> [1] 0.8919465
    
    
    4 %.>% (1 + .)
    #> [1] 5
    
    
    data.frame(x=1) %.>% dplyr::bind_rows(list(.,.))
    #>   x
    #> 1 1
    #> 2 1
    

    Code here.

    And another example:

    # devtools::install_github("WinVector/wrapr")
    suppressPackageStartupMessages(library("dplyr"))
    
    # Bizarro Pipe Example
    (function(x) { mtcars ->.; 
      select(., !!enquo(x)) })(disp) ->.; 
    head(.)
    #>                   disp
    #> Mazda RX4          160
    #> Mazda RX4 Wag      160
    #> Datsun 710         108
    #> Hornet 4 Drive     258
    #> Hornet Sportabout  360
    #> Valiant            225
    
    # dot block pipe example
    library("wrapr")
    (function(x) mtcars %.>% 
        select(., !!enquo(x)))(disp) %.>% 
      head(.)
    #>                   disp
    #> Mazda RX4          160
    #> Mazda RX4 Wag      160
    #> Datsun 710         108
    #> Hornet 4 Drive     258
    #> Hornet Sportabout  360
    #> Valiant            225
    
    
    # magrittr example
    (function(x) mtcars %>% 
        select(!!enquo(x)))(disp) %>%
      head()
    #> Error: `function (expr) 
    #> {
    #>     enexpr(expr)
    #> }` must resolve to integer column positions, not a function
    

    And timings.

  3. It looks like dplyr itself did use a nice analogy definition for its original pipe operator “%.%“. From chain.r:

    The functions work via simple substitution so that
    x %.% f(y) is translated into f(x, y).

    Whereas looking into magrittr‘s code it looks like the magrittr strategy is to Curry expressions to single-argument functions taking “dot”. But this isn’t the complete analogy or specification as “help(`%>%`, package = 'magrittr')” indicates a lot of context is needed to work out how and when things are evaluated (look at the notes on () and {}). It looks like changing these changes what R language elements magrittr sees at different times, allowing magrittr to pick different behaviors.

    I’d say in hindsight that dplyr::`%.%` was a cleaner “syntactic sugar” style concept, while magrittr::`%>%` was trying to do a lot more (add new capabilities). The power of making argument control explicit through “.” in magrittr::`%>%` obviously was considered important.

  4. I’m am not a programmer, and find your post fascinating, since it discusses issues with some of the programming. I am, by no means, an R expert, and still struggles and find new ways of doing things. I use the tidyverse, however, from day one, I did not like the at the end, thus not a true R programmer then. I also like the bizzaro pipe (->.;). Now, here is where my skills fails me. Why does this work? Was this in base R, or is it a combination of -> goes to . then execute (;)?

    1. It is as you surmised: Bizarro Pipe is an emergent behavior of combining symbols that have always been in base-R. You can add spaces and read it as “-> . ;” which is read as “assign to dot and then end the current statement and start a new one.” When you think a lot about sequence you eventually internalize that is what a pipe does (modulo “dot” being a real thing or just a first-argument convention).

  5. I almost forgot, one more pipe variation gauranteed to anger everyone: “;.=“.

    a <-{ 
      4 ->.;.=
        sin(.)  ;.=
        exp(.)  ;.=
        cos(.)  ;.}
    
%d bloggers like this: