Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial. This reflects our opinion on the “which is better for data science R […]

Estimated reading time: 1 minute

I’ve been writing a lot about a category theory interpretations of data-processing pipelines and some of the improvements we feel it is driving in both the data_algebra and in rquery/rqdatatable. I think I’ve found an even better category theory re-formulation of the package, which I will describe here.

Estimated reading time: 12 minutes

Introduction I would like to talk about some of the design principles underlying the data_algebra package (and also in its sibling rquery package). The data_algebra package is a query generator that can act on either Pandas data frames or on SQL tables. This is discussed on the project site and […]

Estimated reading time: 31 minutes

Our goal has been to make rquery the best query generation system for R (and to make data_algebra the best query generator for Python). Lets see what rquery is good at, and what new features are making rquery better.

Estimated reading time: 10 minutes

Nina Zumel had a really great article on how to prepare a nice Keras performance plot using R. I will use this example to show some of the advantages of cdata record transform specifications.

Estimated reading time: 9 minutes

This note is a simple data wrangling example worked using both the Python data_algebra package and the R cdata package. Both of these packages make data wrangling easy through he use of coordinatized data concepts (relying heavily on Codd’s “rule of access”). The advantages of data_algebra and cdata are: The […]

Estimated reading time: 17 minutes