Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented […]
If you are working with predictive modeling or machine learning in R this is the R tip that is going to save you the most time and deliver the biggest improvement in your results. R Tip: Use the vtreat package for data preparation in predictive analytics and machine learning projects. […]
[Reader’s Note. Some of our articles are applied and some of our articles are more theoretical. The following article is more theoretical, and requires fairly formal notation to even work through. However, it should be of interest as it touches on some of the fine points of cross-validation that are […]
Win-Vector LLC, Nina Zumel and I are pleased to announce that ‘vtreat’ version 0.5.26 has been released on CRAN. ‘vtreat’ is a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. (from the package documentation) ‘vtreat’ is an R package that incorporates a number of […]
This article is a demonstration the use of the R vtreat variable preparation package followed by caret controlled training. In previous writings we have gone to great lengths to document, explain and motivate vtreat. That necessarily gets long and unnecessarily feels complicated. In this example we are going to show […]
vtreat cross frames John Mount, Nina Zumel 2016-05-05 As a follow on to “On Nested Models” we work R examples demonstrating “cross validated training frames” (or “cross frames”) in vtreat.
Nina Zumel has donated some time to greatly improve the vtreat R package documentation (now available as pre-rendered HTML here). vtreat is an R data.frame processor/conditioner package that helps prepare real-world data for predictive modeling in a statistically sound manner.
The Microsoft Data Science User Group just sponsored Dr. Nina Zumel‘s presentation “Preparing Data for Analysis Using R”. Microsoft saw Win-Vector LLC‘s ODSC West 2015 presentation “Prepping Data for Analysis using R” and generously offered to sponsor improving it and disseminating it to a wider audience. We feel Nina really […]
Nina and I are proud to share our lecture: “Prepping Data for Analysis using R” from ODSC West 2015. Nina Zumel and John Mount ODSC WEST 2015 It is about 90 minutes, and covers a lot of the theory behind the vtreat data preparation library. We also have a Github […]
Nina Zumel and I are proud to announce our R vtreat variable treatment library has just been accepted by CRAN!