I would like to share a video where we show how to use the vtreat data transformer in the KNIME data science platform.
Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented […]
We have some really nifty upcoming enhancements to wrapr unpack/to.
One reason we are developing the wrapr to/unpack methods is the following: we wanted to spruce up the R vtreat interface a bit.
I would like to introduce an exciting feature in the upcoming 1.9.6 version of the wrapr R package: value unpacking.
We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface).
Video of our PyData Los Angeles 2019 talk Preparing Messy Real World Data for Supervised Machine Learning is now available. In this talk describe how to use vtreat, a package available in R and in Python, to correctly re-code real world data for supervised machine learning tasks. Please check it […]
Our goal has been to make rquery the best query generation system for R (and to make data_algebra the best query generator for Python). Lets see what rquery is good at, and what new features are making rquery better.
Introduction rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in […]
We are excited to share a free extract of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019: Evaluating a Classification Model with a Spam Filter. This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction. […]