Menu Home

The Intercept Fallacy

A common mis-understanding of linear regression and logistic regression is that the intercept is thought to encode the unconditional mean or the training data prevalence. This is easily seen to not be the case. Consider the following example in R. library(wrapr) We set up our example data. # build our […]

New WVPlot: ROCPlotPairList

We have a new R WVPlots plot: ROCPlotPairList. It is useful for comparing the ROC/AUC of multiple models on the same data set. library(WVPlots) set.seed(34903490) x1 <- rnorm(50) x2 <- rnorm(length(x1)) x3 <- rnorm(length(x1)) y <- 0.2*x2^2 + 0.5*x2 + x1 + rnorm(length(x1)) frm <- data.frame( x1 = x1, x2 […]

Unrolling the ROC

In our data science teaching, we present the ROC plot (and the area under the curve of the plot, or AUC) as a useful tool for evaluating score-based classifier models, as well as for comparing multiple such models. The ROC is informative and useful, but it’s also perhaps overly concise […]

Free vtreat Tutorial Videos

I would like to re-share links to our free vtreat data preparation system introduction videos, which show you what sort of machine learning problems vtreat can help you with. Python vtreat introduction video (PyData LA 2019), slides here. R vtreat introduction video (Why R? Foundation). The idea is: instead of […]

Don’t Feel Guilty About Selecting Variables

We have an exciting new article to share: Don’t Feel Guilty About Selecting Variables. If you are at all interested in the probabilistic justification of important data science techniques, such as variable selection or pruning, this should be an informative and fun read. “Data Science” is often criticized with the […]