## The Sell ∀ ∃ as ∃ ∀ Scam

Artificial intelligence, like machine learning before it, is making big money off what I call the “sell ∀ ∃ as ∃ ∀ scam.” The scam works as follows. Build a system that solves problems, but with an important user-facing control. For AI systems like GPT-X this is “prompt engineering.” For […]

## What does machine learning fitting actually do?

This is a short note on what machine learning fitting actually does. We usually teach: A correct statistical or machine learning fitting procedure will, with high probability, correctly identify or infer a system that is close to the one actually producing our training examples. For this to actually happen we […]

## What is a Good Test Set Size?

Introduction Teaching basic data science, machine learning, and statistics is great due to the questions. Students ask brilliant questions, as they see what holes are present in your presentation and scaffolding. The students are not yet conditioned to ask only what you feel is easy to answer or present. They […]

## Bounding Excess Generalization Error

I am sharing a new free video where I work through a great common argument that bounds expected excess generalization error as a ratio of model complexity (in rows) over training set size (again in rows), independent of problem dimension. (link) For more of my notes on support vector machines […]

## Classification as Censored Regression

I have a new short video lecture to share: “Classification as Censored Regression.”

## “Statistics to English Translation”

The core of our “statistics to English translation” series is Nina Zumel’s sequence of articles: “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’ Statistics to English Translation, Part 2b: […]

## 0.83 is a Special AUC

0.83 (or more precisely 5/6) is a special Area Under the Curve (AUC), which we will show in this note.

## Free vtreat Tutorial Videos

I would like to re-share links to our free vtreat data preparation system introduction videos, which show you what sort of machine learning problems vtreat can help you with. Python vtreat introduction video (PyData LA 2019), slides here. R vtreat introduction video (Why R? Foundation). The idea is: instead of […]

## Y-Conditionally Regularized Neural Nets

Win Vector LLC’s Dr. Nina Zumel has had great success applying y-aware methods to machine learning problems, and working out the detailed cross-validation methods needed to make y-aware procedures safe. I thought I would try our hand at y-aware neural net or deep learning methods here.

## Cross-Methods are a Leak/Variance Trade-Off

We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting examples of when cross-methods (cross validation, and also cross-frames) work, and when they do not work. Abstract […]