What does an announcement like “vtreat now up on CRAN!” mean?


The story is: Nina​ Zumel and I run a data science consultancy (Win-Vector LLC). Our company helps other analysts and data scientists turn industrial scale data into useful business tools. Our company helps companies without analysts or data scientists do the same. Our company trains and coaches analysts in mastering these tasks. We serve on various academic and industrial advisory boards.

Example projects include using historic financial data to suggest new account products for a bank, building useful alerts from home sensor data, pricing used automobiles, or predicting which mortgage holders are more likely to refinance soon. Data science is the art of taking the large amounts of data produced by companies (sometimes in the form of under-curated “data exhaust”) and refining it into effective automated models and decision procedures (for example: picking who to send a specific discount offer to).

We get most of our business from other analysts and data scientist. Our goal is to be the data scientists that other data scientists turn to. “The data scientists’ data scientists.” But that is also why we need more business introductions. Please put us in front of your company’s officers and decision makers. They don’t have to be data science experts for us to help.

To this end we try to shape and improve the manner analytics and data science are actually performed in practice. We produce a lot of technical training material (our book Practical Data Science with R , our video course Introduction to Data Science , our technical blog, and many public talks). We release a lot of our methods as open source software- freely usable by our partners and peers.

vtreat is one such piece of software. It prepares data for analysis. It performs a number of useful transformations, and takes a number of precautions that we feel are sorely neglected in common practice. Sharing vtreat freely in the public CRAN repository gives it the widest possible distribution.

vtreat distills a number of things we have written about and taught (in particular chapters 4 and 6 of Practical Data Science with R). We concede: automation leads away from understanding, but we also feel precautions that require effort tend not be taken. They become “More honor’d in the breach than the observance.” So: read our chapters and use the package! We release such software to both advance common practice, and to promote our company.

We ask you, our friends and readers, the following. Please help. Or more accurately: please continue to help. Please share this article. Please search your business network for people who need help with R, data science or analytics. We deliver deep consulting and corporate training. Please bring us in for a meeting, or even a free tech talk. Our company lives by these generous introductions, and we could always use more of them.

