If you work with
R and data, now is the time to check out the
- All coordinatized data or fluid data operations are now in the
cdatapackage (no longer split between the
- The transforms are now centered on the more general table driven
moveValuesToColumnsN()operators (though pivot and un-pivot are now made available as convenient special cases).
- All the transforms are now implemented in
DBI(no longer using
dplyr, though we do include examples of using
- This is (unfortunately) a user visible API change, however adapting to the changed API is deliberately straightforward.
cdata now supplies very general data transforms on both in-memory
data.frames and remote or large data systems (
Spark/Hive, and so on). These transforms include operators such as pivot/un-pivot that were previously not conveniently available for these data sources (for example
tidyr does not operate on such data, despite
dplyr doing so).
To help transition we have updated the existing documentation:
- “Coordinatized Data” Theory (polished article).
- The “fluid data” methodology (still a work in progress).
The fluid data document is a bit long, as it covers a lot of concepts quickly. We hope to develop more targeted training material going forward.
cdata theory and package now allow very concise and powerful transformations of big data using
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.