An update on site maintenance. We have moved the Win Vector LLC site to https hosting. Some links were damaged by the transition, but we are fixing them as we find them. Overall the following changes are present: https://win-vector.com/dfiles/ file content has moved to https://github.com/WinVector/Examples/tree/main/dfiles The blog is here: https://win-vector.com/blog-2/. […]
The core of our “statistics to English translation” series is Nina Zumel’s sequence of articles: “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’ Statistics to English Translation, Part 2b: […]
I am conducting another machine learning / AI bootcamp this week. Starting one of these always makes me want to get more statistical commentaries down, just in case I need one. These classes have to move fast, and also move correctly. In this case I want to write about decomposition […]
Introduction I’d like to talk about the Kolmogorov Axioms of Probability as another example of revisionist history in mathematics (another example here). What is commonly quoted as the Kolmogorov Axioms of Probability is, in my opinion, a less insightful formulation than what is found in the 1956 English translation of […]
What we’ve got here is failure to communicate Suppose I were to say: “any natural number can be written uniquely, up to order, as a, possibly empty, finite product of prime number(s).” This seems possibly correct, and possibly even careful. Though, one may have to look up the terms (such […]
I am finishing up a work-note that has some really neat implications as to why working with AUC is more powerful than one might think. I think I am far enough along to share the consequences here. This started as some, now reappraised, thoughts on the fallacy of thinking knowing […]
Here is an incredibly clear, but unfortunately gruesome, example of a variation of Bayes’ Law. A good teachable point. Consider the recent CDC article “Community and Close Contact Exposures Associated with COVID-19 Among Symptomatic Adults ≥18 Years in 11 Outpatient Health Care Facilities.” It states: Adults with positive SARS-CoV-2 test […]
I am working on a promising new series of notes: common data science fallacies and pitfalls. (Probably still looking for a good name for the series!) I thought I would share a few thoughts on it, and hopefully not jinx it too badly.
San Francisco Wednesday September 9th, 2020 at 9:40AM. Almost as dark as night. San Francisco Wednesday September 9th, 2020 at 2:10PM. San Francisco Wednesday September 9th, 2020 at 5:10PM. This is a full color photo, no effects.
A common mis-understanding of linear regression and logistic regression is that the intercept is thought to encode the unconditional mean or the training data prevalence. This is easily seen to not be the case. Consider the following example in R. library(wrapr) We set up our example data. # build our […]