## Separating Code from Presentation in Jupyter Notebooks

One of the great conveniences of performing a data science style analysis using Jupyter is that Jupyter notebooks are literate containers that combine code, text, results, and graphs. This is also one of the pain points in working with Jupyter notebooks with partners or with source control. That is: Jupyter […]

## Wondering How To Think About Data Science

I just got back from a workshop meeting called Digital Transformation of Decision Analysis. This was a workshop organized by Eyas Raddad, David Matheson, and John-Mark Agosta. It was sponsored by The Society of Decision Professionals and Microsoft. Microsoft generously hosted at their new Experience Center at the Microsoft Silicon […]

## Just For Fun: Computing the Probability of Winning a Tournament

Taking a break from weekend’s Elden Ring gaming to work out the probability of winning a tournament. The article can be found here: Some Math Inspired by Losing in Elden Ring. It is a variation on a “persuasion by calculation of examples” style I am working on.

## Working in CRAN’s World

Part of the deal of having a package up on CRAN is: at any time one may be sent an automated email like the following. Dear maintainer, Please see the problems shown on URL. Please correct before TODAY+14DAYS to safely retain your package on CRAN. The CRAN Team If this […]

## An appreciation of Cover’s universal portfolio in Python

I have a new theoretical finance note up: an appreciation of Cover’s universal portfolio in Python.

## Method Warnings

Introduction The data algebra is a Python system for designing data transformations that can be used in Pandas or SQL. The new 1.3.0 version introduces a lot of early checking and warnings to make designing data transforms more convenient and safer. An Example I’d like to demonstrate some of these […]

## How to Re-Map Many Columns in a Database

Introduction A surprisingly tricky problem in doing data science or analytics in the database are situations where one has to re-map a large number of columns. This occurs, for example, in the vtreat data preparation system. In the vtreat case, a large number of the variable encodings reduce to table-lookup […]

## xicor for Confusion Matrices

We have found that for 2 by 2 confusion matrices (a common summary relating the relation between categorical variables) the expected value of the xicor coefficient of correlation specializes into the re-normalized square of the determinant! One can summarize how a 0/1 variable x relates to a 0/1 variable y […]

## Exploring the XI Correlation Coefficient

Nina Zumel Recently, we’ve been reading about a new correlation coefficient, $$\xi$$ (“xi”), which was introduced by Professor Sourav Chatterjee in his paper, “A New Coefficient of Correlation”. The $$\xi$$ coefficient has the following properties: If $$y$$ is a function of $$x$$, then $$\xi$$ goes to 1 asymptotically as $$n$$ […]

## How to Read Sourav Chatterjee’s Basic XICOR Definition

Introduction Professor Sourav Chatterjee recently published a new coefficient of correlation called XICOR (refs: JASA, R package, Arxiv, Hacker News, and a Python package (different author)). The basic formula (in the tie-free case) is: Take X and Y as n-vectors of observations of random variable. Compute the ranks r(i) of […]