Menu Home

On The Decomposition of Variance

I am conducting another machine learning / AI bootcamp this week. Starting one of these always makes me want to get more statistical commentaries down, just in case I need one. These classes have to move fast, and also move correctly. In this case I want to write about decomposition of variance.

A famous equation in statistics is:

   SSyy = SSR + SSE

This decomposes SSyy (the variation in a dependent variable quantity to be predicted, in this case a numeric regression target), into SSR (variation of a prediction about the dependent variable’s mean) plus SSE (variation between the quantity to be predicted and the prediction). This decomposition is fundamental in ANOVA, and also sometimes discussed in the context of OLS (ordinary least squares) linear regression.

Here is this equation in Gonick, Smith The Cartoon Guide to Statistics, 1993, Harper Collins page 194:

IMG 1073

The equation is stated inside a box or context labeled “ANOVA.” It is true in this context, and some others (such as OLS regression). But it is important to remember we need a context for such an equation to hold.

Let’s see this equation in action using R.

First we set up some synthetic data where x1, x2, x3 are our proposed explanatory variables, and y is the numeric dependent variable to be predicted.

We fit our model with lm, and then show the balance conditions on lm’s predictions.

## [1] "SSR= 1333.364, SSE= 109.922, SSyy= 1443.285, eq= 0.000"

The equation is true in the above instance. And it is only true when certain conditions are met by the predictor. Without these conditions established the equation may not hold.

What is often missed, or not discussed enough, is: this equality or decomposition requires certain properties of the predictor to hold. These properties are true for OLS if the model includes an intercept term. However, these properties are often not true for other models. For example: regularization kills the relation.

Let’s see the equation fail for an L2-regularized linear regression.

## Loading required package: Matrix

## Loaded glmnet 4.0-2
## [1] "SSR= 1142.413, SSE= 117.363, SSyy= 1443.285, eq= -183.509"

Notice the equation does not hold, eq did not cancel to something very near zero.

What this means is: one can’t claim the original identity out of context. One must establish the conditions it requires are met before claiming the conclusion. Relations like the above hold for predictions that are optimal with respect to shift and scaling. OLS predictors are optimal in this sense if there was an intercept term allowed in the model. Models such as regularized models, maximum likelihood models, and others may not have this property.

In statistics it can be just as bad to attempt to apply some of the equations everywhere as to “not know the equations.”

Categories: Opinion Statistics To English Translation Tutorials

Tagged as:


Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

%d bloggers like this: