Menu Home

My Opinion on R’s Upcoming Pipe

R‘s upcoming pipe appears to be currently proposed as a syntactic transform of the form:

  a |> f(...)   ->    f(a, ...)
  a |> f()      ->    f(a)

There is a current active discussion on this prototype and some interesting points come up. Note the current proposal appears to disallow a |> f -> f(a), a currently popular transform.

  1. This is a language feature presented as a soon-to-be-user-visible prototype, not an RFC.
  2. Some are objecting to the term “pipe.”
  3. Some call this sort of pipe function composition.
  4. It is noticed that this sort of substitution is generally thought of as a “macro.”
  5. There is a claim the proposed pipe seems to violate the beta-reduction rule of the lambda calculus: variables should be substitutable for values. The idea is if the following code fragment is allowed.
      f <- function(x) { x + 1 }
      2 |> f()
    

    Then replacing f with its value should also be valid. And one might even want a strong substitution of expressions, and be able to write:

      2 |> function(x) { x + 1 }
    

    It appears the current R-language |> operator does not allow the second expression unless extra parenthesis are introduced (either to group the function declaration terms or add on an argument evaluation slot). I haven’t tried this, so I may be wrong, I am attempt to excerpt this from the dev email chain.

My comments are as follows.

  1. Point 1 seems a bit too pragmatic for core language features.
  2. The word “pipe” is used in many languages to mean something other than the Unix pipe. It is lore that “Unix pipe is the only pipe” bullying is why R’s magrittr package, the supplier of the popular R-extension pipe, is named magrittr.
  3. There is a strong analogy to function composition, but there are some details that relate this strongly to function application or even macro application.
  4. Macros or marcro-like entities in R are likely a bigger problem than one might expect. What R calls “functions” are closures (they capture environments) with essentially Lisp FEXPR semantics. These forms were largely abandoned in later Lisps in favor of a split into functions that work in applicative order (arguments are evaluated before the function is evaluated) and macros (roughly transformations on code). According to the Wikipedia, Kent Pitman argued in 1980 that once you have macros, FEXPRs become hard to defend.
  5. Point 5 probably doesn’t matter to the end users. The popular R pipe magrittr doesn’t allow this form either. Also, such an objection may be confusing substitution of expressions with substitution of values. One doesn’t expect to substitute the expression side of x <- 1 + 2 into x * 3 without some extra parenthesis. However, I feel there are likely some important points of this form that have not been discussed in a large enough venue at this time. We may all be missing something if we don’t listen to feed back such as this.
  6. A lot of the issue is: R FEXPRs get their arguments un-evaluated. One can use this to implement a lot of language features (control structures, domain specific mini-languages) at the user or package level. R packages really do feel like R extensions. However, FEXPRs don’t receive their arguments un-parsed, so some things are not possible at this level.
  7. One of the objections to package supplied pipes is the requirement of verbose user-specified infix operators that start and end with “%“. This is why magrittr’s pipe is written as “%>%” (there are also some issues of operator precedence, but they are minor and fixed by the occasional introduction of parenthesis). data.table essentially uses “][” as a pipe operator without needing any syntactic hooks, instead relying on the self-return conventions of “[]“.

My conclusions/opinions are the following.

  • A syntactic transform of the type being proposed can only be done in the core language. So some variation of the core-R proposal likely has value.
  • I strongly prefer place-holders. I think it is a much more powerful convention and avoids needing to introduce lambda in many places. I think Scala uses something like this to great success. There are also great advantages of being able to pipe into expressions instead of just functions. However, I understand the base-R pipe is not my pipe, so having expressed my preference I am willing to move on.
  • The community would have loved an RFC on this. The new pipe has been presented as part of the 2020 use-R conference, and announced, but comments really are not being solicited (I know that strongly includes this note).
  • It would make sense to supply a second infix operator that is unbound, so packages that supply a pipe can use it as an alternate notation. If the base-R pipe’s only advantage over a package such as magrittr were going to be the notation is “|>” instead of “%>%“, then give magittr (and other packages) an additional symbol of similar quality. The unbound assignment operator “:=” is already used to great advantage in the data.table, dplyr, and wrapr packages. I ask: make some infix operator (with appropriate precedence) to the packages. Possibilities might include: “=>“, “*>“, “:>“, “:]“, “|]“, or some other current syntactically invalid fragment. I know I would love such for my own packages. Keep the better operator for base-R, but please give the packages a nice-ish one also.

    All the packages could use the same new pipe symbol, and users pick which one by what package they attach. Obviously the popular choice would be magrittr, but as long as the symbol is equally available to all developers that feels fair.

And those are my very unsolicited thoughts on the new R pipe. I admit: I have dog in the race, my own pipe that I use in my own work. I feel in addition to possibly coloring my opinions (which I am trying to be careful about) that also gives me some relevant experience. I’ve already inserted one message into the R-dev email chain, so I am trying to limit my comments to this blog which is less of an imposition on R-dev subscribers.

Categories: Opinion

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

1 reply

  1. I’ve never liked magrittr. It leaves too much of the lambda calculus behind, even if it does not disable that. What I mean is that using magrittr “pipes” or whatever takes a mind out of lambda world thinking. I always think of functions as first class objects, and am offended when I cannot. It’s tough to remember the rules for pushing values into severally-arg’d functions. And it’s tough to make sense of how to do currying. That’s the point of environments, even if the history has gone in another direction. You cannot do lambda calculus implementations properly without closures. That’s a battle long fought and settle.

    Still, I can understand the annoyances of composition. But, hey, R isn’t LISP. It’s neater than that.

    Syntactic devices in the functional world are old. But computing languages in my opinion need some commitments to a way of expressing computations, not simply pushing text around. Otherwise what’s gotten is a LaTeX front end on top of some calculating engine. Pure lambda calculus is one. LISP another. There’s Scheme and R. And there’s Smalltalk.

    Need to be careful, I think, about two things.

    First and foremost, we need to listen to the voice of the late Professor Marvin Minsky from his 1970 Turing Award Lecture. He proposed a postponement operator there, and illustrated another from Professor Dana Scott.

    Second, like any other established language of computation, it is treacherous to change basic definitions without risking breaking existing code. Doing that would be a travesty, and there are many examples from which to learn this lesson.

    * C to C++
    * Python, with its transition from Python 2 to Python 3
    * The completely incompatible series of languages having “Visual Basic” as the stem of their names
    * APL to APL2

    I may be wrong, for I am no longer a FORTRAN expert, but I believe FORTRAN is notable for still being able to compile FORTRAN II in even the latest compilers.

    Because of its simple rules, granted with the exception of fexprs, LISP was also a language with longevity.

    And Smalltalk 80 had an extensibility nearly unsurpassed by its successors.

    There’s a lot of experience that has come before. We don’t need to rethink these ab initio.

    Like

%d bloggers like this: