We have a discount on Manning Books, including our own Practical Data Science with R 2nd Edition!
Win Vector LLC’s Dr. Nina Zumel has had great success applying y-aware methods to machine learning problems, and working out the detailed cross-validation methods needed to make y-aware procedures safe. I thought I would try our hand at y-aware neural net or deep learning methods here.
R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values. Of course, sometimes it takes a while […]
wrapr 2.0.0 is now up on CRAN. This means the := variant of unpack is now easy to install. Please give it a try!
I would like to re-share vtreat (R version, Python version) a data preparation documentation for machine learning tasks. vtreat is a system for preparing messy real world data for predictive modeling tasks (classification, regression, and so on). In particular it is very good at re-coding high-cardinality string-valued (or categorical) variables […]
For data science projects I recommend using source control or version control, and committing changes at a very fine level of granularity. This means checking in possibly broken code, and the possibly weak commit messages (so when working in a shared project, you may want a private branch or second […]
For all our remote learners, we are sharing a free coupon code for our R video course Introduction to Data Science. The code is ITDS2020, and can be used at this URL https://www.udemy.com/course/introduction-to-data-science/?couponCode=ITDS2020 . Please check it out and share it!
Here is a small quote from Practical Data Science with R Chapter 1. It is often too much to ask for the data scientist to become a domain expert. However, in all cases the data scientist must develop strong domain empathy to help define and solve the right problems. Interested? […]
Students have asked me if it is better to use the same cross-validation plan in each step of an analysis or to use different ones. Our answer is: unless you are coordinating the many plans in some way (such as 2-way independence or some sort of combinatorial design) it is […]
A big thank you to Dmytro Perepolkin for sharing a “Keep Calm and Use vtreat” poster! Also, we have translated the Python vtreat steps from our recent “Cross-Methods are a Leak/Variance Trade-Off” article into R vtreat steps here. This R-port demonstrates the new to R fit/prepare notation! We want vtreat […]