## Why not Square Error for Classification?

Win Vector LLC has been developing and delivering a lot of “statistics, machine learning, and data science for engineers” intensives in the past few years. These are bootcamps, or workshops, designed to help software engineers become more comfortable with machine learning and artificial intelligence tools. The current thinking is: not […]

What is the Gauss-Markov theorem? From “The Cambridge Dictionary of Statistics” B. S. Everitt, 2nd Edition: A theorem that proves that if the error terms in a multiple regression have the same variance and are uncorrelated, then the estimators of the parameters in the model produced by least squares estimation […]

## Automatic bias correction doesn’t fix omitted variable bias

Page 94 of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd Edition (which we will call BDA3) provides a great example of what happens when common broad frequentist bias criticisms are over-applied to predictions from ordinary linear regression: the predictions appear to fall apart. BDA3 goes on to […]

## Skimming statistics papers for the ideas (instead of the complete procedures)

Been reading a lot of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd edition lately. Overall in the Bayesian framework some ideas (such as regularization, and imputation) are way easier to justify (though calculating some seemingly basic quantities becomes tedious). A big advantage (and weakness) of this formulation […]

## Trimming the Fat from glm() Models in R

One of the attractive aspects of logistic regression models (and linear models in general) is their compactness: the size of the model grows in the number of coefficients, not in the size of the training data. With R, though, glm models are not so concise; we noticed this to our […]

## What is meant by regression modeling?

What is meant by regression modeling? Linear Regression is one of the most common statistical modeling techniques. It is very powerful, important, and (at first glance) easy to teach. However, because it is such a broad topic it can be a minefield for teaching and discussion. It is common for […]

## Unprincipled Component Analysis

As a data scientist I have seen variations of principal component analysis and factor analysis so often blindly misapplied and abused that I have come to think of the technique as unprincipled component analysis. PCA is a good technique often used to reduce sensitivity to overfitting. But this stated design […]

## The cranky guide to trying R packages

This is a tutorial on how to try out a new package in R. The summary is: expect errors, search out errors and don’t start with the built in examples or real data. Suppose you want to try out a novel statistical technique? A good fraction of the time R […]

## CRU graph yet again (with R)

IowaHawk has a excellent article attempting to reproduce the infamous CRU climate graph using OpenOffice: Fables of the Reconstruction. We thought we would show how to produced similarly bad results using R.