The R
package cdata
now has version 0.7.0
available from CRAN
.
cdata
is a data manipulation package that subsumes many higher order data manipulation operations including pivot/un-pivot, spread/gather, or cast/melt. The record to record transforms are specified by drawing a table that expresses the record structure (called the “control table” and also the link between the key concepts of row-records and block-records).
What can be quickly specified and achieved using these concepts and notations is amazing and quite teachable. These transforms can be run in-memory or in remote database or big-data systems (such as Spark).
The concepts are taught in Nina Zumel’s excellent tutorial.
And in John Mount’s quick screencast/lecture.
The 0.7.0
update adds local versions of the operators in addition to the Spark and database implementations. These methods should now be a bit safer for in-memory complex/annotated types such as dates and times.
Categories: data science Opinion Practical Data Science Pragmatic Data Science Pragmatic Machine Learning Statistics
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.