Menu Home

Binning Data in a Database

Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. They compare a case-based approach (where the bin divisions are stuffed into code) with a join based approach. They share code and timings.

Best of all: rquery gets some attention and turns out to be the dominant solution at all scales measured.

Here is an example timing (lower times better):

NewImage

So please check the article out.

Categories: Administrativia Programming Statistics

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

%d bloggers like this: