# A Quick Appreciation of the R transform Function

R users who also use the dplyr package will be able to quickly understand the following code that adds an estimated area column to a data.frame.

suppressPackageStartupMessages(library("dplyr"))

iris %>%
mutate(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) %>%
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1          5.1         3.5          1.4         0.2  setosa  0.2199115
## 2          4.9         3.0          1.4         0.2  setosa  0.2199115
## 3          4.7         3.2          1.3         0.2  setosa  0.2042035
## 4          4.6         3.1          1.5         0.2  setosa  0.2356194
## 5          5.0         3.6          1.4         0.2  setosa  0.2199115
## 6          5.4         3.9          1.7         0.4  setosa  0.5340708

The notation we used above is the "explicit argument" variation we recommend for readability. What a lot of dplyr users do not seem to know is: base-R already has this functionality. The function is called transform().

To demonstrate this, let’s first detach dplyr to show that we are not using functions from dplyr.

Now let’s write the equivalent pipeline using exclusively base-R.

iris ->.
transform(
.,
Petal.Area = (pi/4)*Petal.Width*Petal.Length) ->.
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1          5.1         3.5          1.4         0.2  setosa  0.2199115
## 2          4.9         3.0          1.4         0.2  setosa  0.2199115
## 3          4.7         3.2          1.3         0.2  setosa  0.2042035
## 4          4.6         3.1          1.5         0.2  setosa  0.2356194
## 5          5.0         3.6          1.4         0.2  setosa  0.2199115
## 6          5.4         3.9          1.7         0.4  setosa  0.5340708

The "->." notation is the end-of-line variation of the Bizarro Pipe. The transform() function has been part of R since 1998. dplyr::mutate() was introduced in 2014.

git log --all -p --reverse --source -S 'transform <-'

commit 41c2f7338c45dbf9eac99c210206bc3657bca98a refs/remotes/origin/tags/R-0-62-4
Author: pd <pd@00db46b3-68df-0310-9c12-caf00c1e9a41>
Date:   Wed Feb 11 18:31:12 1998 +0000

Added the frametools functions subset() and transform()

git-svn-id: https://svn.r-project.org/R/trunk@709 00db46b3-68df-0310-9c12-caf00c1e9a41

Categories: Programming Tutorials

Tagged as:

### jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

### 2 replies ›

1. help(transform) makes it clear that “pd” is none other than Peter Dalgaard! Like so many R users I first came to R through his excellent book “Introductory Statistics with R”, 1st Edition, Springer 2002. All R users owe a great debt to Peter Dalgaard and the ideas he brings to R.

Like

2. And the within() variation.

git log --all -p --reverse --source -S 'within <-'
commit 4e3eae932367d54e0181d2ab192cd31d90a3d49c refs/remotes/origin/djm-parseRd
Author: pd
Date:   Sat Sep 1 08:56:32 2007 +0000

new function within()

git-svn-id: https://svn.r-project.org/R/trunk@42714 00db46b3-68df-0310-9c12-caf00c1e9a41

iris ->.
within(., {
Petal.Area <- (pi/4)*Petal.Width*Petal.Length
}) ->.