## Doing Better than the Average

The standard way to estimate the an expected value of a population from a sample of values v1 … vn is to compute the average (1/n) sumi = 1…nvi. It is well known in statistics that for grouped data, there are other estimators that can have smaller expected square error. […]

## Just For Fun: Computing the Probability of Winning a Tournament

Taking a break from weekend’s Elden Ring gaming to work out the probability of winning a tournament. The article can be found here: Some Math Inspired by Losing in Elden Ring. It is a variation on a “persuasion by calculation of examples” style I am working on.

## An appreciation of Cover’s universal portfolio in Python

I have a new theoretical finance note up: an appreciation of Cover’s universal portfolio in Python.

## How to Pick an Optimal Utility Threshold Using the ROC Plot

Nina Zumel just completed an excellent short sequence of articles on picking optimal utility thresholds to convert a continuous model score for a classification problem into a deployable classification rule. Squeezing the Most Utility from Your Models Estimating Uncertainty of Utility Curves This is very compatible with our advice to […]

## Clearly The Author Does Not Know What The Natural Numbers Are

What we’ve got here is failure to communicate Suppose I were to say: “any natural number can be written uniquely, up to order, as a, possibly empty, finite product of prime number(s).” This seems possibly correct, and possibly even careful. Though, one may have to look up the terms (such […]

## A Gruesome Example of Bayes’ Law

Here is an incredibly clear, but unfortunately gruesome, example of a variation of Bayes’ Law. A good teachable point. Consider the recent CDC article “Community and Close Contact Exposures Associated with COVID-19 Among Symptomatic Adults ≥18 Years in 11 Outpatient Health Care Facilities.” It states: Adults with positive SARS-CoV-2 test […]

## Data Science is a Science (Just Not the One You May Think)

I am working on a promising new series of notes: common data science fallacies and pitfalls. (Probably still looking for a good name for the series!) I thought I would share a few thoughts on it, and hopefully not jinx it too badly.

## Unrolling the ROC

In our data science teaching, we present the ROC plot (and the area under the curve of the plot, or AUC) as a useful tool for evaluating score-based classifier models, as well as for comparing multiple such models. The ROC is informative and useful, but it’s also perhaps overly concise […]

## Let A be a Pedant and Let B be a Pedant

One of my favorite mathematical anecdotes is the following story that Gian-Carlo Rota told about Solomon Lefschetz: He [Solomon Lefschetz] liked to repeat, as an example of mathematical pedantry, the story of one of E. H. Moore’s visits to Princeton, when Moore started a lecture by saying, “Let a be […]

## Use the Same Cross-Plan Between Steps

Students have asked me if it is better to use the same cross-validation plan in each step of an analysis or to use different ones. Our answer is: unless you are coordinating the many plans in some way (such as 2-way independence or some sort of combinatorial design) it is […]