An update on site maintenance. We have moved the Win Vector LLC site to https hosting. Some links were damaged by the transition, but we are fixing them as we find them. Overall the following changes are present: https://win-vector.com/dfiles/ file content has moved to https://github.com/WinVector/Examples/tree/main/dfiles The blog is here: https://win-vector.com/blog-2/. […]
The core of our “statistics to English translation” series is Nina Zumel’s sequence of articles: “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’ Statistics to English Translation, Part 2b: […]
I am working on a promising new series of notes: common data science fallacies and pitfalls. (Probably still looking for a good name for the series!) I thought I would share a few thoughts on it, and hopefully not jinx it too badly.
We have a new R WVPlots plot: ROCPlotPairList. It is useful for comparing the ROC/AUC of multiple models on the same data set. library(WVPlots) set.seed(34903490) x1 <- rnorm(50) x2 <- rnorm(length(x1)) x3 <- rnorm(length(x1)) y <- 0.2*x2^2 + 0.5*x2 + x1 + rnorm(length(x1)) frm <- data.frame( x1 = x1, x2 […]
I would like to re-share links to our free vtreat data preparation system introduction videos, which show you what sort of machine learning problems vtreat can help you with. Python vtreat introduction video (PyData LA 2019), slides here. R vtreat introduction video (Why R? Foundation). The idea is: instead of […]
I’d like some feedback on a possible article or series. I am thinking about writing and/or recording videos on the measure theoretic foundations of probability. The idea is: empirical probability (probabilities of coin flips, dice rolls, and finite sequences) is fairly well taught and approachable. However, theoretical probability (the type […]
Nina Zumel has updated our training page to describe the Python data science intensive for software engineers we have been conducting for a couple of years. This is private group training in addition to our usual R training for scientists, and consulting offerings. Please check it out.
Allison Horst, Alison Hill, and Kristen Gorman are working to make a neat new example data set available to R users: the palmer penguins. It is a nice alternative to the over-used Iris data set as it has more rows, some missing values, nicer examples of Simpson’s Paradox, and more […]
Nina and I are cleaning up websites, links, and projects. I would like to take the opportunity re-share my old genetic art project through a short demonstration video. Read more about the Genetic Art Project here.
Chapter 8 “Advanced Data Preparation” of Practical Data Science with R is a study in: Using the R vtreat package for advanced data preparation. Cross-validated data preparation. It is the professionally edited, ready to cite version of an important data preparation methodology. An advantage being: a number of well documented […]