One of my favorite mathematical anecdotes is the following story that Gian-Carlo Rota told about Solomon Lefschetz: He [Solomon Lefschetz] liked to repeat, as an example of mathematical pedantry, the story of one of E. H. Moore’s visits to Princeton, when Moore started a lecture by saying, “Let a be […]

Estimated reading time: 10 minutes

I’d like some feedback on a possible article or series. I am thinking about writing and/or recording videos on the measure theoretic foundations of probability. The idea is: empirical probability (probabilities of coin flips, dice rolls, and finite sequences) is fairly well taught and approachable. However, theoretical probability (the type […]

Estimated reading time: 2 minutes

Here is a fun combinatorial puzzle. I’ve probably seen this used to teach before, but let’s try to define or work this one from memory. I would love to hear more solutions/analyses of this problem. Suppose you have n kettles of soup labeled 0 through n-1. For our problem we […]

Estimated reading time: 14 minutes

A client recently came to us with a question: what’s a good way to monitor data or model output for changes? That is, how can you tell if new data is distributed differently from previous data, or if the distribution of scores returned by a model have changed? This client, […]

Estimated reading time: 17 minutes

We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material. Please check the lectures out. […]

Estimated reading time: 36 seconds

While working on a variation of the RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices: The superscript “top” denoting the transpose operation, the ||.||^2_2 denoting sum of squares norm, and the single |.| denoting determinant. This is derived from one of the check equations […]

Estimated reading time: 33 seconds

Here at Win-Vector LLC we like permutation tests. Our team has written on them (for example: How Do You Know if Your Data Has Signal?) and they are used to estimate significances in our sigr and WVPlots R packages. For example permutation methods are used to estimate the significance reported […]

Estimated reading time: 9 minutes

Nina Zumel prepared an excellent article on the consequences of working with relative error distributed quantities (such as wealth, income, sales, and many more) called “Living in A Lognormal World.” The article emphasizes that if you are dealing with such quantities you are already seeing effects of relative error distributions […]

Estimated reading time: 17 minutes

Beginning analysts and data scientists often ask: “how does one remember and master the seemingly endless number of classifier metrics?” My concrete advice is: Read Nina Zumel’s excellent series on scoring classifiers. Keep notes. Settle on one or two metrics as you move project to project. We prefer “AUC” early […]

Estimated reading time: 15 minutes

In our previous note we demonstrated Y-Aware PCA and other y-aware approaches to dimensionality reduction in a predictive modeling context, specifically Principal Components Regression (PCR). For our examples, we selected the appropriate number of principal components by eye. In this note, we will look at ways to select the appropriate […]

Estimated reading time: 16 minutes