## Touching the 3rd Rail of Data Science: “R or Python?”

I’ve been seeing a lot of hot takes on if one should do data science in R or in Python. I’ll comment generally on the topic, and then add my own myopic gear-head micro benchmark. I’ll jump in: If learning the language is the big step: then you are a […]

## Y-Aware PCA

We have had some trouble with some articles being damaged or hard to access in the Win Vector blog. I (John Mount) do want to apologize for that. In particular the graphs are missing for Dr. Nina Zumel’s wonderful y-aware Pricipal Components regression series. The complete R .md and .Rmd […]

## Separating Code from Presentation in Jupyter Notebooks

One of the great conveniences of performing a data science style analysis using Jupyter is that Jupyter notebooks are literate containers that combine code, text, results, and graphs. This is also one of the pain points in working with Jupyter notebooks with partners or with source control. That is: Jupyter […]

## Working in CRAN’s World

Part of the deal of having a package up on CRAN is: at any time one may be sent an automated email like the following. Dear maintainer, Please see the problems shown on URL. Please correct before TODAY+14DAYS to safely retain your package on CRAN. The CRAN Team If this […]

## Exploring the XI Correlation Coefficient

Nina Zumel Recently, we’ve been reading about a new correlation coefficient, $$\xi$$ (“xi”), which was introduced by Professor Sourav Chatterjee in his paper, “A New Coefficient of Correlation”. The $$\xi$$ coefficient has the following properties: If $$y$$ is a function of $$x$$, then $$\xi$$ goes to 1 asymptotically as $$n$$ […]

## How to Read Sourav Chatterjee’s Basic XICOR Definition

Introduction Professor Sourav Chatterjee recently published a new coefficient of correlation called XICOR (refs: JASA, R package, Arxiv, Hacker News, and a Python package (different author)). The basic formula (in the tie-free case) is: Take X and Y as n-vectors of observations of random variable. Compute the ranks r(i) of […]

## Kelly Thorp Betting

I demonstrate a Kelly/Thorp betting system for the simple card game of guessing if the next card from a standard deck is red or black. I have a video of the play here. And a derivation of the betting strategy in R is here. A derivation of the proof you […]

## It Has Always Been Wrong to Call order on a data.frame

In R it has always been incorrect to call order() on a data.frame. Such a call doesn’t return a sort-order of the rows, and previously did not return an error. For example. d <- data.frame( x = c(2, 2, 3, 3, 1, 1), y = 6:1) knitr::kable(d) x y 2 […]

## Introducing wrapr::bc()

The wrapr R package supplies a number of substantial programming tools, including the S3/S4 compatible dot-pipe, unpack/pack object tools, and many more. It also supplies a number of formatting and parsing convenience tools: qc() (“quoting concatenate”): quotes strings, giving value-oriented interfaces much of the incidental convenience of non-standard evaluation (NSE) […]