Nina Zumel has updated our training page to describe the Python data science intensive for software engineers we have been conducting for a couple of years. This is private group training in addition to our usual R training for scientists, and consulting offerings. Please check it out.
I would like to share a video where we show how to use the vtreat data transformer in the KNIME data science platform.
Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented […]
We have an exciting new article to share: Don’t Feel Guilty About Selecting Variables. If you are at all interested in the probabilistic justification of important data science techniques, such as variable selection or pruning, this should be an informative and fun read. “Data Science” is often criticized with the […]
A kind reader recently shared the following comment on the Practical Data Science with R 2nd Edition live-site. Thanks for the chapter on data frames and data.tables. It has helped me overcome an obstacle freeing me from a lot of warnings telling me my data table was not a real […]
I would like to re-share vtreat (R version, Python version) a data preparation documentation for machine learning tasks. vtreat is a system for preparing messy real world data for predictive modeling tasks (classification, regression, and so on). In particular it is very good at re-coding high-cardinality string-valued (or categorical) variables […]
For all our remote learners, we are sharing a free coupon code for our R video course Introduction to Data Science. The code is ITDS2020, and can be used at this URL https://www.udemy.com/course/introduction-to-data-science/?couponCode=ITDS2020 . Please check it out and share it!
Here is a small quote from Practical Data Science with R Chapter 1. It is often too much to ask for the data scientist to become a domain expert. However, in all cases the data scientist must develop strong domain empathy to help define and solve the right problems. Interested? […]
Students have asked me if it is better to use the same cross-validation plan in each step of an analysis or to use different ones. Our answer is: unless you are coordinating the many plans in some way (such as 2-way independence or some sort of combinatorial design) it is […]
A big thank you to Dmytro Perepolkin for sharing a “Keep Calm and Use vtreat” poster! Also, we have translated the Python vtreat steps from our recent “Cross-Methods are a Leak/Variance Trade-Off” article into R vtreat steps here. This R-port demonstrates the new to R fit/prepare notation! We want vtreat […]