I (Nina Zumel) will be speaking at the Women who Code Silicon Valley meetup on Thursday, October 27.
The talk is called Improving Prediction using Nested Models and Simulated Out-of-Sample Data.
In this talk I will discuss nested predictive models. These are models that predict an outcome or dependent variable (called y) using additional submodels that have also been built with knowledge of y. Practical applications of nested models include “the wisdom of crowds”, prediction markets, variable re-encoding, ensemble learning, stacked learning, and superlearners.
Nested models can improve prediction performance relative to single models, but they introduce a number of undesirable biases and operational issues, and when they are improperly used, are statistically unsound. However modern practitioners have made effective, correct use of these techniques. In my talk I will give concrete examples of nested models, how they can fail, and how to fix failures. The solutions we will discuss include advanced data partitioning, simulated out-of-sample data, and ideas from differential privacy. The theme of the talk is that with proper techniques, these powerful methods can be safely used.
John Mount and I will also be giving a workshop called A Unified View of Model Evaluation at ODSC West 2016 on November 4 (the premium workshop sessions), and November 5 (the general workshop sessions).
We will present a unified framework for predictive model construction and evaluation. Using this perspective we will work through crucial issues from classical statistical methodology, large data treatment, variable selection, ensemble methods, and all the way through stacking/super-learning. We will present R code demonstrating principled techniques for preparing data, scoring models, estimating model reliability, and producing decisive visualizations. In this workshop we will share example data, methods, graphics, and code.
I’m looking forward to these talks, and I hope some of you will be able to attend.
Categories: Administrativia Tutorials
Data scientist with Win Vector LLC. I also dance, read ghost stories and folklore, and sometimes blog about it all.
Will these presentations be available online at some point? We in the vast non-Silicon-Valley areas are interested as well.
Typically ODSC releases full online video for free after the event. And Nina Zumel and I usually release all code and slides in a Github repository. So yes, with a small wait everything should be online.
The slides for all of these talks are now available at the URL below. There is more to the talks than just the slides, but we thought distributing would be helpful. https://github.com/WinVector/NestedModelsTalk