This article is quick concrete example of how to use the techniques from Survive R to lower the steepness of The R Project for Statistical Computing‘s learning curve (so an apology to all readers who are not interested in R). What follows is for people who already use R and […]

Estimated reading time: 12 minutes

Scientists, engineers, and statisticians share similar concerns about evaluating the accuracy of their results, but they don’t always talk about it in the same language. This can lead to misunderstandings when reading across disciplines, and the problem is exacerbated when technical work is communicated to and by the popular media. […]

Estimated reading time: 30 minutes

While executing some statistical detective work for a client we had a major “aha!” moment and realized something like “Amdahl’s Law” rephrased in terms of probability would solve everything. We finished our work using direct methods and moved on. But it is an interesting question: what is the probabilist’s (or […]

Estimated reading time: 9 minutes

New PDF slides version (presented at the Bay Area R Users Meetup October 13, 2009). We at Win-Vector LLC appear to like R a bit more than some of our, perhaps wiser, colleagues ( see: Choose your weapon: Matlab, R or something else? and R and data ). While we […]

Estimated reading time: 5 minutes

What makes a good graph? When faced with a slew of numeric data, graphical visualization can be a more efficient way of getting a feel for the data than going through the rows of a spreadsheet. But do we know if we are getting an accurate or useful picture? How […]

Estimated reading time: 22 minutes

REPOST (now in HTML in addition to the original PDF). This paper demonstrates and explains some of the basic techniques used in data mining. It also serves as an example of some of the kinds of analyses and projects Win Vector LLC engages in.

Estimated reading time: 37 minutes

We explore some of the ideas from the seminal paper “The Data-Enrichment Method” ( Henry R Lewis, Operations Research (1957) vol. 5 (4) pp. 1-5). The paper explains a technique of improving the quality of statistical inference by increasing the effective size of the data-set. This is called “Data-Enrichment.” Now […]

Estimated reading time: 8 minutes

Our first “exciting technique” article is about a statistical language called “R.” R is a language for statistical analysis available from http://cran.r-project.org/ . The things you can immediately do with it are incredible. You can import a spreadsheet and immediately spot relationships, trend and anomalies. R gives you instant access […]

Estimated reading time: 3 minutes

author: John Mount I have finally written up and released a paper in PDF: Automatic Generation and Testing of Trades describing a lot of the statistics and optimization methods used when I was technical trading on a Banc of America Securities proprietary program trading desk. It was a very exciting […]

Estimated reading time: 40 minutes

author: John Mount Nina and I just finished up our analysis of some of the statistical difficulties encountered by users of Google AdSense. It came out a bit long- but we found the right statistical reference to prove that there are real barriers to understanding in this market. The paper […]

Estimated reading time: 61 minutes