R has a number of assignment operators (at least “
=“, and “
->“; plus “
<<-” and “
->>” which have different semantics).
R-style guides routinely insist on “
<-” as being the only preferred form. In this note we are going to try to make the case for “
->” when using magrittr pipelines. [edit: After reading this article, please be sure to read Konrad Rudolph’s masterful argument for using only “
=” for assignment. He also demonstrates a function to land values from pipelines (though that is not his preference). All joking aside, the value-landing part of the proposal does not violate current style guidelines.]
Don Quijote and Sancho Panza, by Honoré Daumier
R‘s preferred assignment operator is “
<-“. This is in the popular style guides. If you write using this style you can organize your code so that:
<-” always means assignment
=” always means function argument binding
==” always means comparison.
This has some advantages, and is the public style. Also “
=” is much harder to use inside R’s
base::quote method than “
<-“, so there are still cases where the semantics of “
=” and “
<-” are different (though I think they all involve the distinction trying specify argument binding versus assignment while inside a function call’s argument list).
I have previously written that given the choice I prefer “
=” for assignment. It has the advantages that:
<-” is has a different meaning to many readers. In
x<-3” assigns the value
3to a variable named
x, in other popular programming languages (where new
Rusers may be coming from) “
x<-3” denotes comparing
=” is a single character, so it can not be ruined by the insertion of a space. “
x< -3” does not assign the value
3to a variable named
x, it compares
-3. I would not mind so much if “
x< -3” was a syntax error (as “
x< =3” is), but it is valid code that quietly does something very different than “
x<-3“. If you have taught
Renough you have experience helping students undo this bug. Also “
=” can not be broken up by line-splitting.
=” is on the keyboard (as “←” was when arrow like assignments were themselves introduced).
=” is easier to paste into HTML as it does not require escape coding such as “
- It is the symbol used in most every other popular current programming language for assignment.
- There is an asymmetric cost of mistakes. Typing “
=” when you meant “
<-” is usually harmless. Typing “
<-” in a context where “
=” was needed is not caught by
Rand fairly bad (please see here for details). So if you get out of the habit of using “
<-” one type of bug become less likely.
- There is a cognitive benefit in reducing the number of low-value distinctions you need to maintain, especially for beginners. If we think of the mind as having “seven plus or minus two” slots for current information do we really want to waste 11 to 20 percent of our students’ attention on something like this when teaching? The beginner does not need to worry over the differences between value assignment and argument binding at all times. In fact it is a useful generalization to think of argument binding as a safe transient value assignment.
Now I said “given the choice” which means to work with others you have to use “
<-” or at least admit that you are being stubborn. I teach “
<- for assignment” as I do not wish to set up students for ridicule (and they being less informed on the history or
R are less equipped to defend theirselves on this issue).
That being said I still don’t actually like “
<-“. And in fact I am not sure why the
R community has so fetishized its use. “
<-” comes form an era when it was actually a symbol on the keyboard and two other
S assignment operators from that era (“
_” and “
:=“) have have not survived in the
R language (please see here). I think the style is largely enforced as a kind of argot or “inside language” to express loyalty to
A deliberately provocative proposal
That being said I have really come to like using
->” operator. I know I can’t always get away with it but consider the advantage using “
->” brings to western readers (meaning users of Greek derived alphabets): you can then simply read code from left to right. If I am not allowed to use “
=” I want something back in exchange, and “
->” actually has some interesting advantages. Let us set up a proposal that is admittedly incompatible with my previous argument.
Consider the following statement:
x = 3 + 4
This is read in R, and most common programming languages, as “assign the value of 3 + 4 to the variable x.” We know to read it this way because “assignment has lower operator precedence than plus.” Roughly this means there implicit parenthesization rules that mean “
x=3+4” is actually shorthand for “
x=(3+4)” (roughly because in
R explicit use of parentheses also controls the auto-printing behavior of values). But consider the same statement written with “
3 + 4 -> x
The semantics still come from operator precedence rules, but now the syntax is emphasizing the same thing: the calculation happens before (to the left of) the assignment. This may not seem like much to experienced programmers- but that is because so many programming languages use the frankly unnatural “
x=3+4” notation (so we are used to it).
A substantial advantage comes when using
magrittr pipes in R.
Suppose I write the following
# Count number of NA in columns x,y, # and z using pure dplyr notation # or back-end agnostic dplyr code. # This involves avoiding use of $ # or things like multiple intermediate # values in dplyr::summarize. # This is a useful example as # complete.cases isn't available on # all dplyr data services. # ifelse() is to ensure type # conversions on remote SQL. library("dplyr") my_db <- dplyr::src_sqlite(":memory:", create = TRUE) data.frame( x = c(1, 2, 2), y = c(3, 5, NA), z = c(NA, 'a', 'b'), rowNum = 1:3, stringsAsFactors = FALSE ) %>% copy_to(my_db, ., 'd') %>% mutate(nna = ifelse(is.na(x),1,0) + ifelse(is.na(y),1,0) + ifelse(is.na(z),1,0)) %>% arrange(rowNum) -> dres
In this notation we see that now “
->” is itself a pipe compatible operator that moves values to variables. The pipeline itself is already moving left to right top to down. Placing the assignment first would give us an ugly two directional flow.
Non semantic changes in the pipeline are now syntactically cheap and localized (as they should be). For example: want to land intermediate results for reasons of efficiency or necessary side-effects? Solution: insert “
-> varName LINEBREAK varName %>%” at will, as you already do with
The syntax is now working for us instead of against us. I feel once you start using
magrittr pipelines (which are written left to right, as we did here) the next logical step is use “
->” for consistency.
The following code has essentially the same semantics as the previous
magrittr pipes, without needing a piping operator.
data.frame( x = c(1, 2, 2), y = c(3, 5, NA), z = c(NA, 'a', 'b'), rowNum = 1:3, stringsAsFactors = FALSE ) -> . copy_to(my_db, ., 'd2') -> . mutate(., nna = ifelse(is.na(x),1,0) + ifelse(is.na(y),1,0) + ifelse(is.na(z),1,0)) -> . arrange(., rowNum) -> dres
The above code has the advantage that it is easier to debug in that you can stop at any stage and the intermediate results are convenient to inspect. However, there was no great call for code in this style (or the matching beginning of line “
. <-” version) prior to the introduction of
magrittr. It just isn’t as enjoyable to use a mere coding convention as it is to use
magrittr pipe syntax. We have a bit more to say on the above coding style here.
- I honestly think in a
->” is a natural assignment operator and could make teaching
Reasier. It reads more fluidly once you get used to it and come to expect assignment to be written late (i.e. once you know where to look).
- I can not currently recommend actually using “
->” in other people’s projects as it is not currently allowed under the most popular
Rstyle guides. Both: Advanced R by Hadley Wickham and Google’s R Style Guide say: “Use
=, for assignment.“
- I would like to propose that “
->” be considered an allowed assignment operator with the stricture code should not reverse directions too often (as that is, in fact, confusing). If you control one of the named style guides, please do consider my suggestion.
Obviously it is hard to change styles, so why write an article like this?
My main reason is I have found in statistics and statistical programming that if you do something diverging from than common practice it is assumed you don’t know the common practice. I find this hectoring attitude non-productive. Often somebody who differs from common practice is familiar with common practice and may be diverging for a well considered reason. Obviously if you are diverging from standard practice you should state why you are doing so or at least that you are doing so. An example would be a note such as “using right arrow for improved flow with long pipes” or “using maximum likelihood estimate instead of unbiased estimate.”
<-” isn’t mere common practice, it is a prescribed style. But the point still applies.
Also when teaching it is important to give the students the ability to reason about what they are starting to work with. Allowing them to maintain considered opinions (that is grounded and informed experiences, not just fancies) about “
->“, and “
<-” makes it in fact easier to teach “use
->” as it makes it more obvious that it is a mere convention, and not some deep truth that they have not yet understood and internalized.
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.