Introduction In many data science projects we have the data, but it “is in the wrong format.” Fortunately re-formatting or reshaping data is a solved problem, with many different available tools. For this note, I would like to show how to reshape data using the data algebra‘s cdata data reshaping […]
Estimated reading time: 20 minutes
This note is a simple data wrangling example worked using both the Python data_algebra package and the R cdata package. Both of these packages make data wrangling easy through he use of coordinatized data concepts (relying heavily on Codd’s “rule of access”). The advantages of data_algebra and cdata are: The […]
Estimated reading time: 17 minutes
I’ve just shared a short webcast on data reshaping in R using the cdata package. (link) We also have two really nifty articles on the theory and methods: Fluid data reshaping with cdata Coordinatized Data: A Fluid Data Specification Please give it a try! This is the material I recently […]
Estimated reading time: 34 seconds
I have some big news about our R package cdata. We have greatly improved the calling interface and Nina Zumel has just written the definitive introduction to cdata. cdata is our general coordinatized data tool. It is what powers the deep learning performance graph (here demonstrated with R and Keras) […]
Estimated reading time: 2 minutes
We have just released a major update of the cdata R package to CRAN. If you work with R and data, now is the time to check out the cdata package.
Estimated reading time: 2 minutes
Authors: John Mount and Nina Zumel Introduction In teaching thinking in terms of coordinatized data we find the hardest operations to teach are joins and pivot. One thing we commented on is that moving data values into columns, or into a “thin” or entity/attribute/value form (often called “un-pivoting”, “stacking”, “melting” […]
Estimated reading time: 11 minutes
Authors: John Mount and Nina Zumel. Introduction It has been our experience when teaching the data wrangling part of data science that students often have difficulty understanding the conversion to and from row-oriented and column-oriented data formats (what is commonly called pivoting and un-pivoting). Boris Artzybasheff illustration Real trust and […]
Estimated reading time: 30 minutes