Menu Home

Cross-Methods are a Leak/Variance Trade-Off

We have a new Win Vector data science article to share: Cross-Methods are a Leak/Variance Trade-Off John Mount (Win Vector LLC), Nina Zumel (Win Vector LLC) March 10, 2020 We work some exciting examples of when cross-methods (cross validation, and also cross-frames) work, and when they do not work. Abstract […]

vtreat Cross Validation

Nina Zumel finished new documentation on how vtreat‘s cross validation works, which I want to share here. vtreat is a system that makes data preparation for machine learning a “one-liner” (available in R or available in Python). We have a set of starting off points here. These documents describe what […]

What is vtreat?

vtreat is a DataFrame processor/conditioner that prepares real-world data for supervised machine learning or predictive modeling in a statistically sound manner. vtreat takes an input DataFrame that has a specified column called “the outcome variable” (or “y”) that is the quantity to be predicted (and must not have missing values). […]

Speaking at BARUG

We will be speaking at the Tuesday, September 3, 2019 BARUG. If you are in the Bay Area, please come see us. Nina Zumel & John Mount Practical Data Science with R Practical Data Science with R (Zumel and Mount) was one of the first, and most widely-read books on […]