I am working on a promising new series of notes: common data science fallacies and pitfalls. (Probably still looking for a good name for the series!) I thought I would share a few thoughts on it, and hopefully not jinx it too badly.
We have a new R WVPlots plot: ROCPlotPairList. It is useful for comparing the ROC/AUC of multiple models on the same data set. library(WVPlots) set.seed(34903490) x1 <- rnorm(50) x2 <- rnorm(length(x1)) x3 <- rnorm(length(x1)) y <- 0.2*x2^2 + 0.5*x2 + x1 + rnorm(length(x1)) frm <- data.frame( x1 = x1, x2 […]
I would like to re-share links to our free vtreat data preparation system introduction videos, which show you what sort of machine learning problems vtreat can help you with. Python vtreat introduction video (PyData LA 2019), slides here. R vtreat introduction video (Why R? Foundation). The idea is: instead of […]
I’d like some feedback on a possible article or series. I am thinking about writing and/or recording videos on the measure theoretic foundations of probability. The idea is: empirical probability (probabilities of coin flips, dice rolls, and finite sequences) is fairly well taught and approachable. However, theoretical probability (the type […]
Nina Zumel has updated our training page to describe the Python data science intensive for software engineers we have been conducting for a couple of years. This is private group training in addition to our usual R training for scientists, and consulting offerings. Please check it out.
Allison Horst, Alison Hill, and Kristen Gorman are working to make a neat new example data set available to R users: the palmer penguins. It is a nice alternative to the over-used Iris data set as it has more rows, some missing values, nicer examples of Simpson’s Paradox, and more […]
Nina and I are cleaning up websites, links, and projects. I would like to take the opportunity re-share my old genetic art project through a short demonstration video. Read more about the Genetic Art Project here.
Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented […]
Just a heads-up, Nina and I are working on re-structuring and updating the website. In particular we are finally moving to https. Please don’t be alarmed if things are in flux, and some links break. We are managing all of https://www.winvector.com, http://www.win-vector.com/, and https://winvector.wordpress.com. The new no-dash URL is not […]
One of the chapters that we are especially proud of in Practical Data Science with R is Chapter 7, “Linear and Logistic Regression.” We worked really hard to explain the fundamental principles behind both methods in a clear and easy-to-understand form, and to document diagnostics returned by the R implementations […]