Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented […]
One of the chapters that we are especially proud of in Practical Data Science with R is Chapter 7, “Linear and Logistic Regression.” We worked really hard to explain the fundamental principles behind both methods in a clear and easy-to-understand form, and to document diagnostics returned by the R implementations […]
We have an exciting new article to share: Don’t Feel Guilty About Selecting Variables. If you are at all interested in the probabilistic justification of important data science techniques, such as variable selection or pruning, this should be an informative and fun read. “Data Science” is often criticized with the […]
A kind reader recently shared the following comment on the Practical Data Science with R 2nd Edition live-site. Thanks for the chapter on data frames and data.tables. It has helped me overcome an obstacle freeing me from a lot of warnings telling me my data table was not a real […]
Data science is often a case of brining the tools to the problems and data, instead of insisting on bringing the problems and data to the tools. To support cross-language data science we have been working on cross-language tools, documentation, and training.
Deal of the Day May 10: Half off Practical Data Science with R, Second Edition. Use code dotd051020au at https://bit.ly/2xLRPCk
Thank you very much Why R? for being awesome hosts. We are really pleased with how your virtual MeetUp went. For those who missed it here is a video link.
Nina Zumel and John Mount will be speaking on advanced data preparation for supervised machine learning at the Why R? Webinar Thursday, May 7, 2020. This is a 8pm in a GMT+2 timezone, which for us is 11AM Pacific Time. Hope to see you there!
We have a discount on Manning Books, including our own Practical Data Science with R 2nd Edition!
R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values. Of course, sometimes it takes a while […]