R
users who also use the dplyr
package will be able to quickly understand the following code that adds an estimated area column to a data.frame
.
suppressPackageStartupMessages(library("dplyr"))
iris %>%
mutate(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) %>%
head(.)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
The notation we used above is the "explicit argument" variation we recommend for readability. What a lot of dplyr
users do not seem to know is: base-R
already has this functionality. The function is called transform()
.
To demonstrate this, let’s first detach dplyr
to show that we are not using functions from dplyr
.
detach("package:dplyr", unload = TRUE)
Now let’s write the equivalent pipeline using exclusively base-R
.
iris ->.
transform(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) ->.
head(.)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
The "->.
" notation is the end-of-line variation of the Bizarro Pipe. The transform()
function has been part of R
since 1998. dplyr::mutate()
was introduced in 2014.
git log --all -p --reverse --source -S 'transform <-'
commit 41c2f7338c45dbf9eac99c210206bc3657bca98a refs/remotes/origin/tags/R-0-62-4
Author: pd <pd@00db46b3-68df-0310-9c12-caf00c1e9a41>
Date: Wed Feb 11 18:31:12 1998 +0000
Added the frametools functions subset() and transform()
git-svn-id: https://svn.r-project.org/R/trunk@709 00db46b3-68df-0310-9c12-caf00c1e9a41
Categories: Tutorials
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
help(transform)
makes it clear that “pd” is none other than Peter Dalgaard! Like so manyR
users I first came toR
through his excellent book “Introductory Statistics with R”, 1st Edition, Springer 2002. AllR
users owe a great debt to Peter Dalgaard and the ideas he brings toR
.And the
within()
variation.