R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values. Of course, sometimes it takes a while […]

Estimated reading time: 3 minutes

I would like to re-share vtreat (R version, Python version) a data preparation documentation for machine learning tasks. vtreat is a system for preparing messy real world data for predictive modeling tasks (classification, regression, and so on). In particular it is very good at re-coding high-cardinality string-valued (or categorical) variables […]

Estimated reading time: 2 minutes

vtreat version 1.5.2 just became available from CRAN. We have a logged a few improvement in the NEWS. The changes are small and incremental, as the package is already in a great stable state for production use.

Estimated reading time: 36 seconds

We have a new improved version of the “how to design a cdata/data_algebra data transform” up! The original article, the Python example, and the R example have all been updated to use the new video. Please check it out!

Estimated reading time: 25 seconds

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial. This reflects our opinion on the “which is better for data science R […]

Estimated reading time: 1 minute

In our last note we stated that unpack is a good tool for load R RDS files into your working environment. Here is the idea expanded into a worked example.

Estimated reading time: 2 minutes

We had such a positive reception to our last Introduction to Data Science promotion, that we are going to try and make the course available to more people by lowering the base-price to $29.99. We are also creating a 1 month promotional price of $20.99. To get a permanent subscription […]

Estimated reading time: 56 seconds