## Rule 42 Software

Continuing (and hopefully ending) our quick series on software pathologies I would like to follow-up The Hyper Dance with “Rule 42 Software.”

## The Hyper Dance

A lot of machine learning, statistical, plotting, and analytics algorithms over-sell a small evil trick I call “the hyper dance.”

## Some Applications of The Spicy Soup Test

Here are a few isolation inspired “applications” (in the theoretical or mathematical sense of the term) of the spicy soup combinatorial design.

## Better SQL Generation via the data_algebra

In our recent note What is new for rquery December 2019 we mentioned an ugly processing pipeline that translates into SQL of varying size/quality depending on the query generator we use. In this note we try a near-relative of that query in the data_algebra.

## Eliminating Tail Calls in Python Using Exceptions

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques. The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result is […]

## Minimal Key Set is NP hard

It usually gives us a chuckle when we find some natural and seemingly easy data science question is NP-hard. For instance we have written that variable pruning is NP-hard when one insists on finding a minimal sized set of variables (and also why there are no obvious methods for exact […]

## Why No Exact Permutation Tests at Scale?

Here at Win-Vector LLC we like permutation tests. Our team has written on them (for example: How Do You Know if Your Data Has Signal?) and they are used to estimate significances in our sigr and WVPlots R packages. For example permutation methods are used to estimate the significance reported […]

## Base R can be Fast

“Base R” (call it “Pure R”, “Good Old R”, just don’t call it “Old R” or late for dinner) can be fast for in-memory tasks. This is despite the commonly repeated claim that: “packages written in C/C++ are (edit: “always”) faster than R code.” The benchmark results of “rquery: Fast […]

## rquery: Fast Data Manipulation in R

Win-Vector LLC recently announced the rquery R package, an operator based query generator. In this note I want to share some exciting and favorable initial rquery benchmark timings.

## On indexing operators and composition

In this article I will discuss array indexing, operators, and composition in depth. If you work through this article you should end up with a very deep understanding of array indexing and the deep interpretation available when we realize indexing is an instance of function composition (or an example of […]