Menu Home

Data Algebra 0.9.0 Release

I am pleased to announce the 0.9.0 release of the data algebra.

The data algebra is realization of the Codd relational algebra for data in written in terms of Python method chaining. It allows the concise clear specification of useful data transforms. Some examples can be found here. Benefits include being able to specify a single data transformation that can then be translated and executed in many realizations, currently including Pandas, Google Big Query, PostgreSQL, Spark, and SQLite. It allows you to rehearse and debug your big data work in memory.

Some noteable features of the 0.9.0 PyPi release include:

  • Improvements to the SQL generation pipeline. The conversion is now in stages: data algebra (the data manipulation gammer), to near sql (objects representing SQL steps), to lines, to single text. This allows a lot of re-use and sharing between the different database dialects.
  • More use of SQL’s WITH operator for more better machine generated SQL.
  • Simulation of RIGHT and FULL joins for SQLite. SQLite doesn’t include RIGHT and FULL joins. The data algebra SQL for SQLite adapter now converts RIGHT joins to LEFT and FULL joins to larger pipelines. The use and methodology is described here. This allows more data pipelines to be rehearsed in SQLite before moving to another database.

We’ve been using the data algebra to speed up development on both client and internal Python data science projects. I invite you to give it a try.

Categories: Administrativia Exciting Techniques Tutorials

Tagged as:

John Mount