In between client work, John and I have been busy working on our book, Practical Data Science with R, 2nd Edition. To demonstrate a toy example for the section I’m working on, I needed scatter plots of the petal and sepal dimensions of the
iris data, like so:
I wanted a plot for petal dimensions and sepal dimensions, but I also felt that two plots took up too much space. So, I thought, why not make a faceted graph that shows both:
Except — which columns do I plot and what do I facet on?
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa
First, load the packages and data:
library("ggplot2") library("cdata") iris <- data.frame(iris)
Now define the data-shaping transform, or control table. The control table is basically a picture that sketches out the final data shape that I want. I want to specify the
y columns of the plot (call these the value columns of the data frame) and the column that I am faceting by (call this the key column of the data frame). And I also need to specify how the key and value columns relate to the existing columns of the original data frame.
Here’s what the control table looks like:
The control table specifies that the new data frame will have the columns
Width. Every row of
iris will produce two rows in the new data frame: one with a
flower_part value of
Petal, and another with a
flower_part value of
Petal row will take the
Petal.Width values in the
Width columns respectively. Similarly for the
Here I create the control table in R, using the convenience function
wrapr::build_frame() to create the
controlTable data frame in a legible way.
(controlTable <- wrapr::build_frame( "flower_part", "Length" , "Width" | "Petal" , "Petal.Length", "Petal.Width" | "Sepal" , "Sepal.Length", "Sepal.Width" ))
## flower_part Length Width ## 1 Petal Petal.Length Petal.Width ## 2 Sepal Sepal.Length Sepal.Width
Now I apply the transform to
iris using the function
rowrecs_to_blocks(). I also want to carry along the
Species column so I can color the scatterplot points by species.
iris_aug <- rowrecs_to_blocks( iris, controlTable, columnsToCopy = c("Species")) head(iris_aug)
## Species flower_part Length Width ## 1 setosa Petal 1.4 0.2 ## 2 setosa Sepal 5.1 3.5 ## 3 setosa Petal 1.4 0.2 ## 4 setosa Sepal 4.9 3.0 ## 5 setosa Petal 1.3 0.2 ## 6 setosa Sepal 4.7 3.2
And now I can create the plot!
ggplot(iris_aug, aes(x=Length, y=Width)) + geom_point(aes(color=Species, shape=Species)) + facet_wrap(~flower_part, labeller = label_both, scale = "free") + ggtitle("Iris dimensions") + scale_color_brewer(palette = "Dark2")
In the next post in this series, I will show how to use
ggplot2 to create a scatterplot matrix.
Data scientist with Win Vector LLC. I also dance, read ghost stories and folklore, and sometimes blog about it all.