I’ve recently released a couple of articles on time series forecasting that I want to re-share: A Time Series Apologia Forecasting in Aggregate Versus in Detail Roughly I am trying to point out alternatives to rushing to ARIMA without trying additional methods. ARIMA is great at handing the issues of […]

Estimated reading time: 1 minute

I would like to share a new article on some of the methods and pitfalls of time series forecasting: “A Time Series Apologia”. In it I work the seemingly simple problem of forecasting a noisy copy of sin(t). The purpose of the article is to demonstrate using ARIMA methods, and […]

Estimated reading time: 42 seconds

Artificial intelligence, like machine learning before it, is making big money off what I call the “sell ∀ ∃ as ∃ ∀ scam.” The scam works as follows. Build a system that solves problems, but with an important user-facing control. For AI systems like GPT-X this is “prompt engineering.” For […]

Estimated reading time: 3 minutes

I am sharing a new short data science video: Parameterized Juypter Notebooks. It is an example from the wvpy package showing how to programmatically re-run the same notebook with many different inputs. If you are doing data science in Python, this may help you with your projects. link

Estimated reading time: 24 seconds

I am sharing yet another data transform tutorial here! It is about coordinatized data, the larger theory encompassing pivot and un-pivot. The example is in Python, but we also supply a similar package for R users.

Estimated reading time: 18 seconds

The data algebra is a system for composing data manipulation tasks in Python. In the data algebra, operator pipelines (or even directed acyclic graphs) are the primary objects. Applying operations composes small data pipelines into larger ones. This allows the fluid specification, inspection, and sharing of data processing and data […]

Estimated reading time: 1 minute

I’ve been seeing a lot of hot takes on if one should do data science in R or in Python. I’ll comment generally on the topic, and then add my own myopic gear-head micro benchmark. I’ll jump in: If learning the language is the big step: then you are a […]

Estimated reading time: 5 minutes

I’ve just started experimenting with the Polars data frame library in Python. I really like the programmable API it exposes. In fact I am starting an experimental adapter from the data algebra to Polars. When this is complete one can use the data algebra to run the same data transform […]

Estimated reading time: 46 seconds

I am excited to share my guest lecture for Department of Statistics at the University of Illinois STAT 447: Data Science Programming Methods. And thank you to Dirk Eddelbuettel for inviting me! The talk was titled “Data Science: Street Fighting Statistics” and demonstrates two simple supervised modeling tasks in R. […]

Estimated reading time: 35 seconds