Artificial intelligence, like machine learning before it, is making big money off what I call the “sell ∀ ∃ as ∃ ∀ scam.” The scam works as follows. Build a system that solves problems, but with an important user-facing control. For AI systems like GPT-X this is “prompt engineering.” For […]

Estimated reading time: 3 minutes

This is a short note on what machine learning fitting actually does. We usually teach: A correct statistical or machine learning fitting procedure will, with high probability, correctly identify or infer a system that is close to the one actually producing our training examples. For this to actually happen we […]

Estimated reading time: 2 minutes

Introduction Teaching basic data science, machine learning, and statistics is great due to the questions. Students ask brilliant questions, as they see what holes are present in your presentation and scaffolding. The students are not yet conditioned to ask only what you feel is easy to answer or present. They […]

Estimated reading time: 23 minutes

I am sharing a new free video where I work through a great common argument that bounds expected excess generalization error as a ratio of model complexity (in rows) over training set size (again in rows), independent of problem dimension. (link) For more of my notes on support vector machines […]

Estimated reading time: 34 seconds

The core of our “statistics to English translation” series is Nina Zumel’s sequence of articles: “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’ Statistics to English Translation, Part 2b: […]

Estimated reading time: 55 seconds

I would like to re-share links to our free vtreat data preparation system introduction videos, which show you what sort of machine learning problems vtreat can help you with. Python vtreat introduction video (PyData LA 2019), slides here. R vtreat introduction video (Why R? Foundation). The idea is: instead of […]

Estimated reading time: 58 seconds

Win Vector LLC’s Dr. Nina Zumel has had great success applying y-aware methods to machine learning problems, and working out the detailed cross-validation methods needed to make y-aware procedures safe. I thought I would try our hand at y-aware neural net or deep learning methods here.

Estimated reading time: 10 minutes

We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting examples of when cross-methods (cross validation, and also cross-frames) work, and when they do not work. Abstract […]

Estimated reading time: 1 minute