I’ve recently released a couple of articles on time series forecasting that I want to re-share: A Time Series Apologia Forecasting in Aggregate Versus in Detail Roughly I am trying to point out alternatives to rushing to ARIMA without trying additional methods. ARIMA is great at handing the issues of […]

Estimated reading time: 1 minute

I would like to share a new article on some of the methods and pitfalls of time series forecasting: “A Time Series Apologia”. In it I work the seemingly simple problem of forecasting a noisy copy of sin(t). The purpose of the article is to demonstrate using ARIMA methods, and […]

Estimated reading time: 42 seconds

I am sharing yet another data transform tutorial here! It is about coordinatized data, the larger theory encompassing pivot and un-pivot. The example is in Python, but we also supply a similar package for R users.

Estimated reading time: 18 seconds

I’ve been seeing a lot of hot takes on if one should do data science in R or in Python. I’ll comment generally on the topic, and then add my own myopic gear-head micro benchmark. I’ll jump in: If learning the language is the big step: then you are a […]

Estimated reading time: 5 minutes

I am excited to share my guest lecture for Department of Statistics at the University of Illinois STAT 447: Data Science Programming Methods. And thank you to Dirk Eddelbuettel for inviting me! The talk was titled “Data Science: Street Fighting Statistics” and demonstrates two simple supervised modeling tasks in R. […]

Estimated reading time: 35 seconds

We have had some trouble with some articles being damaged or hard to access in the Win Vector blog. I (John Mount) do want to apologize for that. In particular the graphs are missing for Dr. Nina Zumel’s wonderful y-aware Pricipal Components regression series. The complete R .md and .Rmd […]

Estimated reading time: 2 minutes

Part of the deal of having a package up on CRAN is: at any time one may be sent an automated email like the following. Dear maintainer, Please see the problems shown on URL. Please correct before TODAY+14DAYS to safely retain your package on CRAN. The CRAN Team If this […]

Estimated reading time: 6 minutes

Nina Zumel Recently, we’ve been reading about a new correlation coefficient, \(\xi\) (“xi”), which was introduced by Professor Sourav Chatterjee in his paper, “A New Coefficient of Correlation”. The \(\xi\) coefficient has the following properties: If \(y\) is a function of \(x\), then \(\xi\) goes to 1 asymptotically as \(n\) […]

Estimated reading time: 11 minutes

I demonstrate a Kelly/Thorp betting system for the simple card game of guessing if the next card from a standard deck is red or black. I have a video of the play here. And a derivation of the betting strategy in R is here. A derivation of the proof you […]

Estimated reading time: 42 seconds

In R it has always been incorrect to call order() on a data.frame. Such a call doesn’t return a sort-order of the rows, and previously did not return an error. For example. d <- data.frame( x = c(2, 2, 3, 3, 1, 1), y = 6:1) knitr::kable(d) x y 2 […]

Estimated reading time: 2 minutes