One of the great conveniences of performing a data science style analysis using Jupyter is that Jupyter notebooks are literate containers that combine code, text, results, and graphs. This is also one of the pain points in working with Jupyter notebooks with partners or with source control. That is: Jupyter […]
I just got back from a workshop meeting called Digital Transformation of Decision Analysis. This was a workshop organized by Eyas Raddad, David Matheson, and John-Mark Agosta. It was sponsored by The Society of Decision Professionals and Microsoft. Microsoft generously hosted at their new Experience Center at the Microsoft Silicon […]
Part of the deal of having a package up on CRAN is: at any time one may be sent an automated email like the following. Dear maintainer, Please see the problems shown on URL. Please correct before TODAY+14DAYS to safely retain your package on CRAN. The CRAN Team If this […]
I have a new theoretical finance note up: an appreciation of Cover’s universal portfolio in Python.
Introduction A surprisingly tricky problem in doing data science or analytics in the database are situations where one has to re-map a large number of columns. This occurs, for example, in the vtreat data preparation system. In the vtreat case, a large number of the variable encodings reduce to table-lookup […]
Nina Zumel Recently, we’ve been reading about a new correlation coefficient, \(\xi\) (“xi”), which was introduced by Professor Sourav Chatterjee in his paper, “A New Coefficient of Correlation”. The \(\xi\) coefficient has the following properties: If \(y\) is a function of \(x\), then \(\xi\) goes to 1 asymptotically as \(n\) […]
When working with multiple data tables we often need to know how for a given set of keys, how many instances of rows each table has. I would like to use such an example in Python as yet another introduction to the data algebra (an alternative to direct Pandas or […]
I’d like to write a bit about measuring effect sizes and Cohen’s d. Introduction For our note let’s settle on a single simple example problem. We have two samples of real numbers a_1, …, a_n and b_1, …, b_n. All the a_i are mutually exchangeable or generated by an independent […]
Every programmer should have an opinion on what the outcomes of the expressions like “5” == 5 should be, and perhaps even a guess as to what the answer is in their most familiar programming language. In my opinion SQL gets it right. For example, we get the following in […]
I’d like to work an example of using SQL WITH Common Table Expressions to produce more legible SQL.