#
Author Archives

### jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

I am conducting another machine learning / AI bootcamp this week. Starting one of these always makes me want to get more statistical commentaries down, just in case I need one. These classes have to move fast, and also move correctly. In this case I want to write about decomposition […]

Estimated reading time: 5 minutes

Introduction I’d like to talk about the Kolmogorov Axioms of Probability as another example of revisionist history in mathematics (another example here). What is commonly quoted as the Kolmogorov Axioms of Probability is, in my opinion, a less insightful formulation than what is found in the 1956 English translation of […]

Estimated reading time: 24 minutes

What we’ve got here is failure to communicate Suppose I were to say: “any natural number can be written uniquely, up to order, as a, possibly empty, finite product of prime number(s).” This seems possibly correct, and possibly even careful. Though, one may have to look up the terms (such […]

Estimated reading time: 13 minutes

I am finishing up a work-note that has some really neat implications as to why working with AUC is more powerful than one might think. I think I am far enough along to share the consequences here. This started as some, now reappraised, thoughts on the fallacy of thinking knowing […]

Estimated reading time: 3 minutes

Here is an incredibly clear, but unfortunately gruesome, example of a variation of Bayes’ Law. A good teachable point. Consider the recent CDC article “Community and Close Contact Exposures Associated with COVID-19 Among Symptomatic Adults ≥18 Years in 11 Outpatient Health Care Facilities.” It states: Adults with positive SARS-CoV-2 test […]

Estimated reading time: 10 minutes

I am working on a promising new series of notes: common data science fallacies and pitfalls. (Probably still looking for a good name for the series!) I thought I would share a few thoughts on it, and hopefully not jinx it too badly.

Estimated reading time: 4 minutes

San Francisco Wednesday September 9th, 2020 at 9:40AM. Almost as dark as night. San Francisco Wednesday September 9th, 2020 at 2:10PM. San Francisco Wednesday September 9th, 2020 at 5:10PM. This is a full color photo, no effects.

Estimated reading time: 17 seconds

A common mis-understanding of linear regression and logistic regression is that the intercept is thought to encode the unconditional mean or the training data prevalence. This is easily seen to not be the case. Consider the following example in R. library(wrapr) We set up our example data. # build our […]

Estimated reading time: 1 minute

We have a new R WVPlots plot: ROCPlotPairList. It is useful for comparing the ROC/AUC of multiple models on the same data set. library(WVPlots) set.seed(34903490) x1 <- rnorm(50) x2 <- rnorm(length(x1)) x3 <- rnorm(length(x1)) y <- 0.2*x2^2 + 0.5*x2 + x1 + rnorm(length(x1)) frm <- data.frame( x1 = x1, x2 […]

Estimated reading time: 47 seconds