R is designed to make working with statistical models fast, succinct, and reliable. For instance building a model is a one-liner: model <- lm(Petal.Length ~ Sepal.Length, data = iris) And producing a detailed diagnostic summary of the model is also a one-liner: summary(model) # Call: # lm(formula = Petal.Length ~ […]

Estimated reading time: 1 minute

As we’ve mentioned on previous occasions, one of the defining characteristics of data science is the emphasis on the availability of “large” data sets, which we define as “enough data that statistical efficiency is not a concern” (note that a “large” data set need not be “big data,” however you […]

Estimated reading time: 15 minutes

Using correlation to track model performance is “a mistake that nobody would ever make” combined with a vague “what would be wrong if I did do that” feeling. I hope after reading this feel a least a small urge to double check your work and presentations to make sure you […]

Estimated reading time: 5 minutes

What is R2? In the context of predictive models (usually linear regression), where y is the true outcome, and f is the model’s prediction, the definition that I see most often is: In words, R2 is a measure of how much of the variance in y is explained by the […]

Estimated reading time: 4 minutes