I’d like to share a great new feature in the wvpy package (available at PyPi). This package is useful in converting Jupiter notebooks to/from python, and also in rendering many parameterized notebooks. The idea is to make Jupyter notebook easier to use in production. The latest feature is an extension […]
Estimated reading time: 2 minutes
A central data science engineering problem is how to organize general data into columns for analysis. I often refer to this as denormalization, or the deliberate arranging of data so all entries of a record are in a single row in a single table. In this note I will write […]
Estimated reading time: 15 minutes
One of the great conveniences of performing a data science style analysis using Jupyter is that Jupyter notebooks are literate containers that combine code, text, results, and graphs. This is also one of the pain points in working with Jupyter notebooks with partners or with source control. That is: Jupyter […]
Estimated reading time: 5 minutes
Introduction The data algebra is a Python system for designing data transformations that can be used in Pandas or SQL. The new 1.3.0 version introduces a lot of early checking and warnings to make designing data transforms more convenient and safer. An Example I’d like to demonstrate some of these […]
Estimated reading time: 12 minutes
The data algebra is a system for specifying data transformations in Pandas or SQL databases. To use it, we advise checking out the README and introduction. These document what data operators are the basis of data algebra transformation construction and composition. I have now added a catalog of what expression […]
Estimated reading time: 54 seconds
When working with multiple data tables we often need to know how for a given set of keys, how many instances of rows each table has. I would like to use such an example in Python as yet another introduction to the data algebra (an alternative to direct Pandas or […]
Estimated reading time: 8 minutes
I’ve been tinkering a lot recently with the data_algebra, and just released version 0.7.0 to PyPi. In this note I’ll touch on what the data algebra is, what the new features are, and my plans going forward.
Estimated reading time: 10 minutes
It looks like R is getting an official pipe operator (ref). R doesn’t work under an RFC process, so we hear about these things and they are discussed on the R-devel mailing list. I’ve written on this topic before (ref), and I have taped some new comments. This sort of […]
Estimated reading time: 1 minute
Nina and I are cleaning up websites, links, and projects. I would like to take the opportunity re-share my old genetic art project through a short demonstration video. Read more about the Genetic Art Project here.
Estimated reading time: 24 seconds