In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems, in particular what […]

Estimated reading time: 26 minutes

In this note, we discuss principal components regression and some of the issues with it: The need for scaling. The need for pruning. The lack of “y-awareness” of the standard dimensionality reduction step.

Estimated reading time: 34 minutes

We here at Win-Vector LLC been working through an ad-hoc series about A/B testing combining elements of both operations research and statistical points of view. A dynamic programming solution to A/B test design Why does designing a simple A/B test seem so complicated? A clear picture of power and significance […]

Estimated reading time: 1 minute

We’ve just finished off a series of articles on some recent research results applying differential privacy to improve machine learning. Some of these results are pretty technical, so we thought it was worth working through concrete examples. And some of the original results are locked behind academic journal paywalls, so […]

Estimated reading time: 1 minute

Authors: John Mount and Nina Zumel Nina and I were noodling with some variations of differentially private machine learning, and think we have found a variation of a standard practice that is actually fairly efficient in establishing differential privacy a privacy condition (but, as commenters pointed out- not differential privacy). […]

Estimated reading time: 9 minutes

There remains a bit of a two-way snobbery that Frequentist statistics is what we teach (as so-called objective statistics remain the same no matter who works with them) and Bayesian statistics is what we do (as it tends to directly estimate posterior probabilities we are actually interested in). Nina Zumel […]

Estimated reading time: 24 minutes

Win-Vector LLC‘s Nina Zumel wrote a great article explaining differential privacy and demonstrating how to use it to enhance forward step-wise logistic regression (essentially reusing test data). This allowed her to reproduce results similar to the recent Science paper “The reusable holdout: Preserving validity in adaptive data analysis”. The technique […]

Estimated reading time: 15 minutes

Differential privacy was originally developed to facilitate secure analysis over sensitive data, with mixed success. It’s back in the news again now, with exciting results from Cynthia Dwork, et. al. (see references at the end of the article) that apply results from differential privacy to machine learning. In this article […]

Estimated reading time: 21 minutes

In some of my recent public talks (for example: here and here) I have mentioned a desire for “a deeper theory of fitting and testing.” I thought I would expand on what I meant by this. In this note I am going to cover a lot of different topics to […]

Estimated reading time: 26 minutes