I have a couple of public appearances coming up soon.
- The East Bay R Language Beginners Group: Preparing Datasets – The Ugly Truth & Some Solutions, Tuesday, May 1, 2018 at Robert Half Technologies, 1999 Harrison Street, Oakland, CA, 94612.
- Official May 2018 BARUG Meeting: rquery: a Query Generator for Working With SQL Data, Tuesday, May 8, 2018 at Intuit, Building 20
2600 Marine Way · Mountain View, CA.
Preparing Datasets – The Ugly Truth & Some Solutions is a great idea of Jim Porzak’s. Jim will speak on problems one is likely to encounter in trying to use real world data for predictive modeling and then I will speak on how the
vtreat package helps address these issues.
vtreat systematizes a number of routine domain independent data repairs and preparations, leaving you more time to work on important domain specific issues (plus it has citable documentation, helping make your methodology section smaller).
vtreat is the best way to prepare messy real world data for predictive modeling.
rquery: a Query Generator for Working With SQL Data
is an introduction to the
rquery query generator system.
rquery is a new
R package that builds “pipe-able SQL” and includes a number of very powerful data operators and analyses. It includes a number of very neat features, including query pipeline diagrams.
cdata) is going to be the best way (easiest to learn, most expressive, easiest to maintain, and most performant) method to use
R to manipulate data at scale (SQL databases and Spark).
Categories: Administrativia data science Exciting Techniques Opinion Pragmatic Data Science Pragmatic Machine Learning Statistics Tutorials
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.