## The Hyper Dance

A lot of machine learning, statistical, plotting, and analytics algorithms over-sell a small evil trick I call “the hyper dance.”

A lot of machine learning, statistical, plotting, and analytics algorithms over-sell a small evil trick I call “the hyper dance.”

It looks like R is getting an official pipe operator (ref). R doesn’t work under an RFC process, so we hear about these things and they are discussed on the R-devel mailing list. I’ve written on this topic before (ref), and I have taped some new comments. This sort of […]

R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values. Of course, sometimes it takes a while […]

Slides from my PyData2019 data_algebra lightning talk are here.

Introduction rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in […]

To understand computations in R, two slogans are helpful: Everything that exists is an object. Everything that happens is a function call. John Chambers In R, the “[” array access operator is a function call. And it is one a user can re-bind to the new effect of their own […]

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques. The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result is […]

We at Win-Vector LLC have some big news. We are finally porting a streamlined version of our R vtreat variable preparation package to Python. vtreat is a great system for preparing messy data for supervised machine learning. The new implementation is based on Pandas, and we are experimenting with pushing […]

Here is simple modeling problem in R. We want to fit a linear model where the names of the data columns carrying the outcome to predict (y), the explanatory variables (x1, x2), and per-example row weights (wt) are given to us as string values in variables.

What R users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for […]