## Yet Another Data Transform Tutorial

I am sharing yet another data transform tutorial here! It is about coordinatized data, the larger theory encompassing pivot and un-pivot. The example is in Python, but we also supply a similar package for R users.

## Touching the 3rd Rail of Data Science: “R or Python?”

I’ve been seeing a lot of hot takes on if one should do data science in R or in Python. I’ll comment generally on the topic, and then add my own myopic gear-head micro benchmark. I’ll jump in: If learning the language is the big step: then you are a […]

## Data Science: Street Fighting Statistics

I am excited to share my guest lecture for Department of Statistics at the University of Illinois STAT 447: Data Science Programming Methods. And thank you to Dirk Eddelbuettel for inviting me! The talk was titled “Data Science: Street Fighting Statistics” and demonstrates two simple supervised modeling tasks in R. […]

## Y-Aware PCA

We have had some trouble with some articles being damaged or hard to access in the Win Vector blog. I (John Mount) do want to apologize for that. In particular the graphs are missing for Dr. Nina Zumel’s wonderful y-aware Pricipal Components regression series. The complete R .md and .Rmd […]

## Working in CRAN’s World

Part of the deal of having a package up on CRAN is: at any time one may be sent an automated email like the following. Dear maintainer, Please see the problems shown on URL. Please correct before TODAY+14DAYS to safely retain your package on CRAN. The CRAN Team If this […]

## Exploring the XI Correlation Coefficient

Nina Zumel Recently, we’ve been reading about a new correlation coefficient, $$\xi$$ (“xi”), which was introduced by Professor Sourav Chatterjee in his paper, “A New Coefficient of Correlation”. The $$\xi$$ coefficient has the following properties: If $$y$$ is a function of $$x$$, then $$\xi$$ goes to 1 asymptotically as $$n$$ […]

## Kelly Thorp Betting

I demonstrate a Kelly/Thorp betting system for the simple card game of guessing if the next card from a standard deck is red or black. I have a video of the play here. And a derivation of the betting strategy in R is here. A derivation of the proof you […]

## It Has Always Been Wrong to Call order on a data.frame

In R it has always been incorrect to call order() on a data.frame. Such a call doesn’t return a sort-order of the rows, and previously did not return an error. For example. d <- data.frame( x = c(2, 2, 3, 3, 1, 1), y = 6:1) knitr::kable(d) x y 2 […]

## Introducing wrapr::bc()

The wrapr R package supplies a number of substantial programming tools, including the S3/S4 compatible dot-pipe, unpack/pack object tools, and many more. It also supplies a number of formatting and parsing convenience tools: qc() (“quoting concatenate”): quotes strings, giving value-oriented interfaces much of the incidental convenience of non-standard evaluation (NSE) […]