Menu Home

Upcoming speaking engagments

I have a couple of public appearances coming up soon.


Preparing Datasets – The Ugly Truth & Some Solutions is a great idea of Jim Porzak’s. Jim will speak on problems one is likely to encounter in trying to use real world data for predictive modeling and then I will speak on how the vtreat package helps address these issues. vtreat systematizes a number of routine domain independent data repairs and preparations, leaving you more time to work on important domain specific issues (plus it has citable documentation, helping make your methodology section smaller).

vtreat is the best way to prepare messy real world data for predictive modeling.


rquery: a Query Generator for Working With SQL Data

is an introduction to the rquery query generator system. rquery is a new R package that builds “pipe-able SQL” and includes a number of very powerful data operators and analyses. It includes a number of very neat features, including query pipeline diagrams.


We think rquery (plus cdata) is going to be the best way (easiest to learn, most expressive, easiest to maintain, and most performant) method to use R to manipulate data at scale (SQL databases and Spark).

Categories: Administrativia Exciting Techniques Opinion Pragmatic Data Science Tutorials

Tagged as:


Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

%d bloggers like this: