Menu Home

Schemas for Python Data Frames

The Pandas data frame is probably the most popular tool used to model tabular data in Python. For in-memory data, Pandas serves a role that might normally fall to a relational database. Though, Pandas data frames are typically manipulated through methods, instead of with a relational query language. One can […]

Solving for Hidden Data

Introduction Let’s continue along the lines discussed in Omitted Variable Effects in Logistic Regression. The issue is as follows. For logistic regression, omitted variables cause parameter estimation bias. This is true even for independent variables, which is not the case for more familiar linear regression. This is a known problem […]

Omitted Variable Effects in Logistic Regression

Introduction I would like to illustrate a way which omitted variables interfere in logistic regression inference (or coefficient estimation). These effects are different than what is seen in linear regression, and possibly different than some expectations or intuitions. Our Example Data Let’s start with a data example in R. # […]

More on Parameterized Jupyter

I’d like to share a great new feature in the wvpy package (available at PyPi). This package is useful in converting Jupiter notebooks to/from python, and also in rendering many parameterized notebooks. The idea is to make Jupyter notebook easier to use in production. The latest feature is an extension […]

List Coloring Latin Squares

(Still on my math streak.) 1994 had an exciting moment when Fred Galvin solved the 1979 Jeff Dinitz conjecture on list-coloring Latin squares. Latin squares are a simple predecessor to puzzles such as Soduko. A Latin square is an n by n grid of the integers 0 through n-1 (called […]

Tilting at Sign

Establishing the “L1L2 AUC” equals 1/2 + arctan(1/sqrt(π – 3)) / π (≅ 0.8854404657887897) used a few nifty lemmas. One of which I am calling “the sign tilting lemma.” The sign tilting lemma is: For X, Y independent mean zero normal random variables with known variances sx2 and sy2, what […]

How often do the L1 and L2 norms agree?

Turns out that I am still on a recreational mathematics run. Here is one I have been working on, arising from trying to explain norms and data science. Barry Rowlingson and John Mount asked the following question. Generate vectors v1 and v2 in Rn with each coordinate generated IID normal […]

Vector Packing Vacation

Just coming back from a vacation where I got some side-time to work some recreational math problems. One stood out, packing vector sums by re-ordering. I feel you don’t deeply understand a proof until you try to work examples and re-write it, so here (for me) it is: Picking Vectors […]