The data algebra is a system for composing data manipulation tasks in Python. In the data algebra, operator pipelines (or even directed acyclic graphs) are the primary objects. Applying operations composes small data pipelines into larger ones. This allows the fluid specification, inspection, and sharing of data processing and data preparation tasks in Python.
The data algebra itself can then be applied to data stores or data tabular data representations such as Pandas, Polars or SQL databases (BigQuery, PostgreSQL, SQLite, SparkSQL, and so on). This means the same data transformation can be realized or implemented either in memory or at massive scale.
My announcement here is: I feel the data algebra Polars adapter is ready for production use! I’ve recently completed enough acceptance tests that I am satisfied that one can safely use this combination in production on client data. This is new, so there will be a few bumps.
For me this is very exciting. Using the same data algebra notation we can now fluidly work over multiple data systems in Python.
I am still working on tuning and documentation. But I will be offering custom training for teams interested in incorporating the data algebra into their workflows.
Categories: Exciting Techniques Opinion