There has been some talk of adding native pipe notation to R (for example here, here, and here). And even a tidyeval
/rlang
pipe here.
I think a critical aspect of such an extension would be to treat such a notation as syntactic sugar and not insist such a pipe match magrittr semantics, or worse yet give a platform for authors to insert their own preferred ad-hoc semantics.
A prominent place where pipe-notation is used is in the language F#,, where it is literally defined in F# itself as:
let (|>) x f = f x
From this simple definition, one versed in the semantics of F# has a chance at inferring the semantics of the pipe.
In the R language, the magrittr package supplies a pipe written as “%>%”. This pipe’s implementation depends on very complicated unevaluated expression capture and direct manipulation of execution environments. Also the magrittr pipe picks its own semantics as it wants, as it does not inherit semantics from a simple definition. For example all of the following are valid magrittr:
library("magrittr") 4 %>% sin #> [1] -0.7568025 4 %>% sin() #> [1] -0.7568025 4 %>% sin(.) #> [1] -0.7568025
These all seem to be convenient choices, but new users have to memorize them as they can not infer these from things they may already know about the R programming language.
Also the freedom of choice in semantics means many arbitrary choices get made (nothing is prior apodictic) and you get some debatable choices such as: “data.frame(x=1) %>% dplyr::bind_rows(list(.,.))
” having 3 rows (in contrast to “data.frame(x=1) %>% { dplyr::bind_rows(list(.,.)) }
” having 2 rows).
What I would like if a native pipe were to be added to R is: for the pipe to be defined as formally equivalent to some larger R expression. Then we could teach it as such and we would not get any new corner cases or exceptional behaviors. Even if an implementation fails to reach this standard, this way of defining things lets us now how it should have worked (in a perfect world).
A good R pipe operator might have an aspirational definition along the lines of:
"a %.>% b" is to be treated as if the user had written "{ . <- a; b };" with "%.>%" being treated as left-associative. (there are "." associated side-effects).
Notice we are not saying “b(a)
” as that doesn’t deal with directed choice of argument placement, and other cases R-users have come to expect. This also allows piping into non-function expressions (a neat feature).
Surprisingly enough the above actually works. It means a pipeline such as the following:
a <- 4 %.>% sin(.) %.>% exp(.) %.>% cos(.)
Is unambiguously meant to be a short-hand for the following (ugly) nested code.
a <- { . <- { . <- { . <- 4 ; sin(.) };; exp(.) };; cos(.) };
It does not matter that nobody would write the nested code, that is precisely what we are not asking anybody to do. The point is, a student can attempt to check this translation on small examples, and even run both versions. For more on association of R-operators please see here.
Notice in the above ugly nested example that “};;
” is starting to look a bit like a piping operator. This is calling out two things:
- Piping syntax largely a convention of using (possibly anonymous) intermediate values instead of nesting of calls.
- Piping semantics are largely about sequencing statements, this is the usual monad is the expensive way to say “programmable semicolon” observation.
Roughly if the user is willing to write code such as the following, then they don’t need pipes.
. <- 4 . <- sin(.) . <- exp(.) . <- cos(.) a <- .
Or, as we have observed before, some notations start to look like you already have piping capabilities in base-R (arguing one should give base-R a chance before insisting on extensions).
For example we can use “;.<-
” as a pipe (the first one I noticed, and my attempt to not use left-arrow that often):
a <-{ 4 ->.;.<- sin(.) ;.<- exp(.) ;.<- cos(.) ;.}
Or we can use “->.;
” as a pipe:
a <-{ 4 ->.; sin(.) ->.; exp(.) ->.; cos(.) }
I think this last one is actually pretty if we go all-in with right-assignment:
4 ->.; sin(.) ->.; exp(.) ->.; cos(.) -> a
The remaining strong objections to this Bizarro Pipe notation (in my mind) are:
- “
->.;
” is an ugly glyph. This is because its representation is its implementation. - This pipe isn’t very compatible with left-assignment (R’s preferred assignment) without adding additional blocks.
Roughly: introducing new notation need not be as disruptive as introducing new semantics. Also conventions can have great advantage, even if they do not have language assistance or enforcement (though such things are good).
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
The other day I had an error I wanted to debug by replacing %>% with bizarro pipe. When I did the replacement and ran the result, instead of R giving me the same error and a variable I could inspect, I got a different error earlier in the pipeline. It seems that bizarro pipe is not entirely equivalent to %>%, perhaps because, as you say, its implementation is so complicated.
Sorry I don’t have time to be more specific and give a MRE, but it seemed worth sharing given the topic of this post!
You are correct Bizarro Pipe does not fully match magrittr semantics. The biggest differences are the not inserting an argument, non-lazy evaluation and insistences of “.”.
Some of what you saw may be an observable effect of the lazy evaluation of some of your data items. It can imagine a syntax error late in a magrittr pipe could mask a data error earlier in a pipe.
But isn’t just you or even me, magrittr has problems even with dplyr.
Oh, what the heck. I am going to call this “dot arrow.”
Code here.
And another example:
And timings.
It looks like
dplyr
itself did use a nice analogy definition for its original pipe operator “%.%
“. Fromchain.r
:Whereas looking into
magrittr
‘s code it looks like themagrittr
strategy is to Curry expressions to single-argument functions taking “dot”. But this isn’t the complete analogy or specification as “help(`%>%`, package = 'magrittr')
” indicates a lot of context is needed to work out how and when things are evaluated (look at the notes on()
and{}
). It looks like changing these changes whatR
language elementsmagrittr
sees at different times, allowingmagrittr
to pick different behaviors.I’d say in hindsight that
dplyr::`%.%`
was a cleaner “syntactic sugar” style concept, whilemagrittr::`%>%`
was trying to do a lot more (add new capabilities). The power of making argument control explicit through “.
” inmagrittr::`%>%`
obviously was considered important.I’m am not a programmer, and find your post fascinating, since it discusses issues with some of the programming. I am, by no means, an R expert, and still struggles and find new ways of doing things. I use the tidyverse, however, from day one, I did not like the at the end, thus not a true R programmer then. I also like the bizzaro pipe (->.;). Now, here is where my skills fails me. Why does this work? Was this in base R, or is it a combination of -> goes to . then execute (;)?
It is as you surmised: Bizarro Pipe is an emergent behavior of combining symbols that have always been in base-
R
. You can add spaces and read it as “-> . ;
” which is read as “assign to dot and then end the current statement and start a new one.” When you think a lot about sequence you eventually internalize that is what a pipe does (modulo “dot” being a real thing or just a first-argument convention).Thanks for clarifying.
I almost forgot, one more pipe variation gauranteed to anger everyone: “
;.=
“.