Nina Zumel has updated our training page to describe the Python data science intensive for software engineers we have been conducting for a couple of years. This is private group training in addition to our usual R training for scientists, and consulting offerings. Please check it out.
Allison Horst, Alison Hill, and Kristen Gorman are working to make a neat new example data set available to R users: the palmer penguins. It is a nice alternative to the over-used Iris data set as it has more rows, some missing values, nicer examples of Simpson’s Paradox, and more […]
Nina and I are cleaning up websites, links, and projects. I would like to take the opportunity re-share my old genetic art project through a short demonstration video. Read more about the Genetic Art Project here.
Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented […]
Just a heads-up, Nina and I are working on re-structuring and updating the website. In particular we are finally moving to https. Please don’t be alarmed if things are in flux, and some links break. We are managing all of https://www.winvector.com, http://www.win-vector.com/, and https://winvector.wordpress.com. The new no-dash URL is not […]
One of the chapters that we are especially proud of in Practical Data Science with R is Chapter 7, “Linear and Logistic Regression.” We worked really hard to explain the fundamental principles behind both methods in a clear and easy-to-understand form, and to document diagnostics returned by the R implementations […]
A kind reader recently shared the following comment on the Practical Data Science with R 2nd Edition live-site. Thanks for the chapter on data frames and data.tables. It has helped me overcome an obstacle freeing me from a lot of warnings telling me my data table was not a real […]
Data science is often a case of brining the tools to the problems and data, instead of insisting on bringing the problems and data to the tools. To support cross-language data science we have been working on cross-language tools, documentation, and training.
Deal of the Day May 10: Half off Practical Data Science with R, Second Edition. Use code dotd051020au at https://bit.ly/2xLRPCk
Nina Zumel and John Mount will be speaking on advanced data preparation for supervised machine learning at the Why R? Webinar Thursday, May 7, 2020. This is a 8pm in a GMT+2 timezone, which for us is 11AM Pacific Time. Hope to see you there!