From https://twitter.com/sharon000/status/1107771331012108288:
From https://tidyr.tidyverse.org/dev/articles/pivot.html (text by Hadley Wickham):
For some time, it’s been obvious that there is something fundamentally wrong with the design of spread() and
gather()
. Many people don’t find the names intuitive and find it hard to remember which direction corresponds to spreading and which to gathering. It also seems surprisingly hard to remember the arguments to these functions, meaning that many people (including me!) have to consult the documentation every time.There are two important new features inspired by other R packages that have been advancing of reshaping in R:
- The reshaping operation can be specified with a data frame that describes precisely how metadata stored in column names becomes data variables (and vice versa). This is inspired by the
cdata
package by John Mount and Nina Zumel. For simple uses ofpivot_long()
andpivot_wide()
, this specification is implicit, but for more complex cases it is useful to make it explicit, and operate on the specification data frame usingdplyr
andtidyr
.- pivot_long() can work with multiple value variables that may have different types. This is inspired by the enhanced
melt()
anddcast()
functions provided by thedata.table
package by Matt Dowle and Arun Srinivasan.
If you want to work in the above way we suggest giving our cdata
package a try. We named the functions pivot_to_rowrecs
and unpivot_to_blocks
. The idea was: by emphasizing the record structure one might eventually internalize what the transforms are doing. On the way to that we have a lot of documentation and tutorials.
- Block Records and Row Records
- Designing Transforms for Data Reshaping with cdata
- Coordinatized Data: A Fluid Data Specification
- Fluid Data
Categories: Pragmatic Data Science Tutorials
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Whether the all of above is adoption or appropriation is going to depend on if the credit to the cdata authors is ever mentioned in talks, how long even the reference lives on in documentation, and if mis-attribution is corrected. We hope it is adoption.
Also I want to emphasize the theory was joint work with Dr. Nina Zumel. I do a lot of the coding and blogging, but she does the more serious writing and more of the concept development. So this is her idea as much as it is mine (despite me being noisier and much harder to like).
Thought I would share a clarification (though I personally consider “won’t fix bugs” as step one of “on the way out”).
This is a great case for we should stick to base-r wherever possible.
As with all things it is a trade off. I myself have never been that handy with the base-methods
stack()
/unstack()
.