Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. They compare a case-based approach (where the bin divisions are stuffed into code) with a join based approach. They share code and timings.
Best of all: rquery
gets some attention and turns out to be the dominant solution at all scales measured.
Here is an example timing (lower times better):
So please check the article out.
Categories: Administrativia
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.