Nina Zumel recently announced upcoming speaking appearances. I want to promote the upcoming sessions at ODSC West 2016 (11:15am-1:00pm on Friday November 4th, or 3:00pm-4:30pm on Saturday November 5th) and invite executives, managers, and other data science consumers to attend. We assume most of the Win-Vector blog audience is made of practitioners (who we hope are already planning to attend), so we are asking you our technical readers to help promote this talk to a broader audience of executives and managers.
Our messages is: if you have to manage data science projects, you need to know how to evaluate results.
In these talks we will lay out how data science results should be examined and evaluated. If you can’t make ODSC (or do attend and like what you see), please reach out to us and we can arrange to present an appropriate targeted summarized version to your executive team.
Nina Zumel is presenting on modeling issues from a purely technical viewpoint at the Women who Code Silicon Valley Meetup on Thursday, October 27 and at some already scheduled private invitation sessions. Nina takes well-know statistical criticisms that certain common habits of combining models are not well-founded, and describes a straightforward improved work pattern that holds up to these criticisms. She centers the problem on as a concrete issue called “nested or sequential model bias.” Her talk shows how the common data science habit of “wishing away” of the nested modeling issue actually degrades results. She then finishes by demonstrating specific procedures (advanced data partitioning, simulated out of sample data, and applications of differential privacy) that can be added to the workflow to conveniently achieve strong statistical rigorous outcomes. Examples and solutions will be shared in the form of “R” markdown worksheets.
For the ODSC talks we add a lively introduction to model failure (a possibility data science teams must guard against), and proper model evaluation (one of the precautions against model failure). This additional material is “why you should be worried.” This is the material that will be valuable for managers and executives. We supply a reasoning framework so executives have enough tools to proactively explore model failure as a manageable and avoidable abstract possibility, instead of only being able to retroactively analyze specific past disasters.
Please help us promote these talks, and hope to see you there.
Categories: Administrativia Opinion Pragmatic Data Science Tutorials
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
The slides for all of these talks are now available at the URL below. There is more to the talks than just the slides, but we thought distributing would be helpful. https://github.com/WinVector/NestedModelsTalk