## Vector Packing Vacation

Just coming back from a vacation where I got some side-time to work some recreational math problems. One stood out, packing vector sums by re-ordering. I feel you don’t deeply understand a proof until you try to work examples and re-write it, so here (for me) it is: Picking Vectors […]

## Some of the Perils of Time Series Forecasting

I’ve recently released a couple of articles on time series forecasting that I want to re-share: A Time Series Apologia Forecasting in Aggregate Versus in Detail Roughly I am trying to point out alternatives to rushing to ARIMA without trying additional methods. ARIMA is great at handing the issues of […]

## A Time Series Apologia

I would like to share a new article on some of the methods and pitfalls of time series forecasting: “A Time Series Apologia”. In it I work the seemingly simple problem of forecasting a noisy copy of sin(t). The purpose of the article is to demonstrate using ARIMA methods, and […]

## It Isn’t Just the AIs Hallucinating

GPT-* and the like are indeed amazing game-changing tools. However, they are not currently quite as magic as advertised. As a minor example, consider the popcount() code example from https://www.pcmag.com/news/samsung-software-engineers-busted-for-pasting-proprietary-code-into-chatgpt. When asked to correct the following code ChatGPT claims the fix is cleaning up some non-ascii characters and claims the […]

## The Sell ∀ ∃ as ∃ ∀ Scam

Artificial intelligence, like machine learning before it, is making big money off what I call the “sell ∀ ∃ as ∃ ∀ scam.” The scam works as follows. Build a system that solves problems, but with an important user-facing control. For AI systems like GPT-X this is “prompt engineering.” For […]

## A Pandas/Polars Rosetta Stone

Dr. Nina Zumel just shared a nice Pandas/Polars Rosetta Stone. She has a list of the common needed data wrangling operations, and how they are realized in Pandas and Polars. This can help with the data wrangling in your projects. Please check it out!

## Doing Better than the Average

The standard way to estimate the an expected value of a population from a sample of values v1 … vn is to compute the average (1/n) sumi = 1…nvi. It is well known in statistics that for grouped data, there are other estimators that can have smaller expected square error. […]

## How Much Data Do You Need?

Introduction A common question in analytics, statistics, and data science projects is: how much data do you need? This question actually has very specific and clear answers! A first good answer is “it is good to have a lot.” Let’s dig deeper and get some additional more detailed quantitative answers. […]

## Bounding Excess Generalization Error for Linear Regression Models

Introduction The goal of this note is to try and characterize excess generalization error: how much worse your model works in production versus how well it appeared to work during training. The clarifying point is excess generalization error (also called overfit) isn’t so much the model performing unexpectedly poorly on […]