## Managing intermediate results when using R/sparklyr

In our latest “R and big data” article we show how to manage intermediate results in non-trivial Apache Spark workflows using R, sparklyr, dplyr, and replyr.

## Neglected optimization topic: set diversity

The mathematical concept of set diversity is a somewhat neglected topic in current applied decision sciences and optimization. We take this opportunity to discuss the issue. The problem Consider the following problem: for a number of items U = {x_1, … x_n} pick a small set of them X = […]

## My Favorite Graphs

The important criterion for a graph is not simply how fast we can see a result; rather it is whether through the use of the graph we can see something that would have been harder to see otherwise or that could not have been seen at all. — William Cleveland, […]

## Learn Logistic Regression (and beyond)

One of the current best tools in the machine learning toolbox is the 1930s statistical technique called logistic regression. We explain how to add professional quality logistic regression to your analytic repertoire and describe a bit beyond that.

We extend the ideas of from Automatic Differentiation with Scala to include the reverse accumulation. Reverse accumulation is a non-obvious improvement to automatic differentiation that can in many cases vastly speed up calculations of gradients.

## Living in A Lognormal World

Recently, we had a client come to us with (among other things) the following question: Who is more valuable, Customer Type A, or Customer Type B? This client already tracked the net profit and loss generated by every customer who used his services, and had begun to analyze his customers […]

## Statistics to English Translation, Part 2b: Calculating Significance

In the previous installment of the Statistics to English Translation, we discussed the technical meaning of the term ”significant”. In this installment, we look at how significance is calculated. This article will be a little more technically detailed than the last one, but our primary goal is still to help […]

## Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’

In this installment of our ongoing Statistics to English Translation series1, we will look at the technical meaning of the term ”significant”. As you might expect, what it means in statistics is not exactly what it means in everyday language. As always, a pdf version of this article is available […]

## “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures

Scientists, engineers, and statisticians share similar concerns about evaluating the accuracy of their results, but they don’t always talk about it in the same language. This can lead to misunderstandings when reading across disciplines, and the problem is exacerbated when technical work is communicated to and by the popular media. […]