Menu Home

A Quick Appreciation of the R transform Function

R users who also use the dplyr package will be able to quickly understand the following code that adds an estimated area column to a data.frame.

suppressPackageStartupMessages(library("dplyr"))

iris %>%
  mutate(
    ., 
    Petal.Area = (pi/4)*Petal.Width*Petal.Length) %>%
  head(.)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1          5.1         3.5          1.4         0.2  setosa  0.2199115
## 2          4.9         3.0          1.4         0.2  setosa  0.2199115
## 3          4.7         3.2          1.3         0.2  setosa  0.2042035
## 4          4.6         3.1          1.5         0.2  setosa  0.2356194
## 5          5.0         3.6          1.4         0.2  setosa  0.2199115
## 6          5.4         3.9          1.7         0.4  setosa  0.5340708

The notation we used above is the "explicit argument" variation we recommend for readability. What a lot of dplyr users do not seem to know is: base-R already has this functionality. The function is called transform().

To demonstrate this, let’s first detach dplyr to show that we are not using functions from dplyr.

detach("package:dplyr", unload = TRUE)

Now let’s write the equivalent pipeline using exclusively base-R.

iris ->.
   transform(
     ., 
     Petal.Area = (pi/4)*Petal.Width*Petal.Length) ->.
   head(.)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1          5.1         3.5          1.4         0.2  setosa  0.2199115
## 2          4.9         3.0          1.4         0.2  setosa  0.2199115
## 3          4.7         3.2          1.3         0.2  setosa  0.2042035
## 4          4.6         3.1          1.5         0.2  setosa  0.2356194
## 5          5.0         3.6          1.4         0.2  setosa  0.2199115
## 6          5.4         3.9          1.7         0.4  setosa  0.5340708

The "->." notation is the end-of-line variation of the Bizarro Pipe. The transform() function has been part of R since 1998. dplyr::mutate() was introduced in 2014.

git log --all -p --reverse --source -S 'transform <-'

commit 41c2f7338c45dbf9eac99c210206bc3657bca98a refs/remotes/origin/tags/R-0-62-4
Author: pd <pd@00db46b3-68df-0310-9c12-caf00c1e9a41>
Date:   Wed Feb 11 18:31:12 1998 +0000

    Added the frametools functions subset() and transform()
    
    git-svn-id: https://svn.r-project.org/R/trunk@709 00db46b3-68df-0310-9c12-caf00c1e9a41

Categories: Tutorials

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

2 replies

  1. help(transform) makes it clear that “pd” is none other than Peter Dalgaard! Like so many R users I first came to R through his excellent book “Introductory Statistics with R”, 1st Edition, Springer 2002. All R users owe a great debt to Peter Dalgaard and the ideas he brings to R.

  2. And the within() variation.

    git log --all -p --reverse --source -S 'within <-'
    commit 4e3eae932367d54e0181d2ab192cd31d90a3d49c refs/remotes/origin/djm-parseRd
    Author: pd 
    Date:   Sat Sep 1 08:56:32 2007 +0000
    
        new function within()
        
        git-svn-id: https://svn.r-project.org/R/trunk@42714 00db46b3-68df-0310-9c12-caf00c1e9a41
    
    iris ->.
       within(., {
          Petal.Area <- (pi/4)*Petal.Width*Petal.Length
          }) ->.
       head(.)
       
    #   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
    # 1          5.1         3.5          1.4         0.2  setosa  0.2199115
    # 2          4.9         3.0          1.4         0.2  setosa  0.2199115
    # 3          4.7         3.2          1.3         0.2  setosa  0.2042035
    # 4          4.6         3.1          1.5         0.2  setosa  0.2356194
    # 5          5.0         3.6          1.4         0.2  setosa  0.2199115
    # 6          5.4         3.9          1.7         0.4  setosa  0.5340708
    
%d bloggers like this: