R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values. Of course, sometimes it takes a while […]

Estimated reading time: 3 minutes

Introduction rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in […]

Estimated reading time: 14 minutes

To understand computations in R, two slogans are helpful: Everything that exists is an object. Everything that happens is a function call. John Chambers In R, the “[” array access operator is a function call. And it is one a user can re-bind to the new effect of their own […]

Estimated reading time: 4 minutes

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques. The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result is […]

Estimated reading time: 6 minutes

We at Win-Vector LLC have some big news. We are finally porting a streamlined version of our R vtreat variable preparation package to Python. vtreat is a great system for preparing messy data for supervised machine learning. The new implementation is based on Pandas, and we are experimenting with pushing […]

Estimated reading time: 1 minute

Here is simple modeling problem in R. We want to fit a linear model where the names of the data columns carrying the outcome to predict (y), the explanatory variables (x1, x2), and per-example row weights (wt) are given to us as string values in variables.

Estimated reading time: 7 minutes

What R users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for […]

Estimated reading time: 5 minutes

The (matter of opinion) claim: “When the use of C++ is very limited and easy to avoid, perhaps it is the best option to do that […]” (source discussed here) got me thinking: does our own RcppDynProg package actually use C++ in a significant way? Could/should I port it to […]

Estimated reading time: 5 minutes

There is a lot of unnecessary worry over “Non Standard Evaluation” (NSE) in R versus “Standard Evaluation” (SE, or standard “variables names refer to values” evaluation). This very author is guilty of over-discussing the issue. But let’s give this yet another try. The entire difference between NSE and regular evaluation […]

Estimated reading time: 6 minutes