There’s a common, yet easy to fix, mistake that I often see in machine learning and data science projects and teaching: using classification rules for classification problems. This statement is a bit of word-play which I will need to unroll a bit. However, the concrete advice is that you often […]
I would like to share a video where we show how to use the vtreat data transformer in the KNIME data science platform. (and we see there is an R/vtreat KNIME example here!)
Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented […]
We have some really nifty upcoming enhancements to wrapr unpack/to.
One reason we are developing the wrapr to/unpack methods is the following: we wanted to spruce up the R vtreat interface a bit.
I would like to introduce an exciting feature in the upcoming 1.9.6 version of the wrapr R package: value unpacking.
We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface).
Video of our PyData Los Angeles 2019 talk Preparing Messy Real World Data for Supervised Machine Learning is now available. In this talk describe how to use vtreat, a package available in R and in Python, to correctly re-code real world data for supervised machine learning tasks. Please check it […]
Our goal has been to make rquery the best query generation system for R (and to make data_algebra the best query generator for Python). Lets see what rquery is good at, and what new features are making rquery better.
Introduction rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in […]