R users who also use the dplyr package will be able to quickly understand the following code that adds an estimated area column to a data.frame.
suppressPackageStartupMessages(library("dplyr"))
iris %>%
mutate(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) %>%
head(.)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
The notation we used above is the "explicit argument" variation we recommend for readability. What a lot of dplyr users do not seem to know is: base-R already has this functionality. The function is called transform().
To demonstrate this, let’s first detach dplyr to show that we are not using functions from dplyr.
detach("package:dplyr", unload = TRUE)
Now let’s write the equivalent pipeline using exclusively base-R.
iris ->.
transform(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) ->.
head(.)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1 5.1 3.5 1.4 0.2 setosa 0.2199115
## 2 4.9 3.0 1.4 0.2 setosa 0.2199115
## 3 4.7 3.2 1.3 0.2 setosa 0.2042035
## 4 4.6 3.1 1.5 0.2 setosa 0.2356194
## 5 5.0 3.6 1.4 0.2 setosa 0.2199115
## 6 5.4 3.9 1.7 0.4 setosa 0.5340708
The "->." notation is the end-of-line variation of the Bizarro Pipe. The transform() function has been part of R since 1998. dplyr::mutate() was introduced in 2014.
git log --all -p --reverse --source -S 'transform <-'
commit 41c2f7338c45dbf9eac99c210206bc3657bca98a refs/remotes/origin/tags/R-0-62-4
Author: pd <pd@00db46b3-68df-0310-9c12-caf00c1e9a41>
Date: Wed Feb 11 18:31:12 1998 +0000
Added the frametools functions subset() and transform()
git-svn-id: https://svn.r-project.org/R/trunk@709 00db46b3-68df-0310-9c12-caf00c1e9a41
Categories: Programming Tutorials
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
help(transform)makes it clear that “pd” is none other than Peter Dalgaard! Like so manyRusers I first came toRthrough his excellent book “Introductory Statistics with R”, 1st Edition, Springer 2002. AllRusers owe a great debt to Peter Dalgaard and the ideas he brings toR.And the
within()variation.git log --all -p --reverse --source -S 'within <-' commit 4e3eae932367d54e0181d2ab192cd31d90a3d49c refs/remotes/origin/djm-parseRd Author: pd Date: Sat Sep 1 08:56:32 2007 +0000 new function within() git-svn-id: https://svn.r-project.org/R/trunk@42714 00db46b3-68df-0310-9c12-caf00c1e9a41iris ->. within(., { Petal.Area <- (pi/4)*Petal.Width*Petal.Length }) ->. head(.) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area # 1 5.1 3.5 1.4 0.2 setosa 0.2199115 # 2 4.9 3.0 1.4 0.2 setosa 0.2199115 # 3 4.7 3.2 1.3 0.2 setosa 0.2042035 # 4 4.6 3.1 1.5 0.2 setosa 0.2356194 # 5 5.0 3.6 1.4 0.2 setosa 0.2199115 # 6 5.4 3.9 1.7 0.4 setosa 0.5340708