I would like to share another quick tutorial on some aspects of the data algebra, this time using the example of comparing two tables. Please check it out here.
I have a new intermediate introduction on the data algebra up here: Using the data algebra for Statistics and Data Science. The data algebra is a tool for data processing in Python which is implemented on top of any of Pandas, Google BigQuery, PostgreSQL, MySQL, Spark, and SQLite. It allows […]
I’ve thought of Pandas as in-memory column oriented data structure with reasonable performance. If I need high performance or scale, I can move to a database. I like Pandas, and thank the authors and maintainers for their efforts. Now I kind of wonder what Pandas is, or what it wants […]
Back to teaching. For a few years we’ve been running a data science intensive at for a really neat FAAMG company. The idea is to give engineers some hands on live workbook time using methods varying from linear regression, xgboost, to deep neural networks. Learning how participants progress and internalize […]
Every programmer should have an opinion on what the outcomes of the expressions like “5” == 5 should be, and perhaps even a guess as to what the answer is in their most familiar programming language. In my opinion SQL gets it right. For example, we get the following in […]
I’d like to work an example of using SQL WITH Common Table Expressions to produce more legible SQL.
I’ve been tinkering a lot recently with the data_algebra, and just released version 0.7.0 to PyPi. In this note I’ll touch on what the data algebra is, what the new features are, and my plans going forward.
Statistics is the science of relating summaries of observable samples to the unobserved summaries of the populations they are drawn from. I try to explain that with an example in this video. (link)
I’d like to share my latest “data science bite“: A/B Testing.
Nina Zumel and John Mount will be speaking at the online University of San Francisco Seminar Series in Data Science! How and why to use probability models to outperform decision rules Friday April 30, 2021 12:30pm – 2pm Pacific Time See here for full details and to RSVP In this […]