One of the recurring frustrations in data analytics is that your data is never in the right shape. Worst case: you are not aware of this and every step you attempt is more expensive, less reliable and less informative than you would want. Best case: you notice this and have […]

Estimated reading time: 18 minutes

This is a tutorial on how to try out a new package in R. The summary is: expect errors, search out errors and don’t start with the built in examples or real data. Suppose you want to try out a novel statistical technique? A good fraction of the time R […]

Estimated reading time: 14 minutes

One of the current best tools in the machine learning toolbox is the 1930s statistical technique called logistic regression. We explain how to add professional quality logistic regression to your analytic repertoire and describe a bit beyond that.

Estimated reading time: 24 minutes

Having worked with Unix (BSD, HPUX, IRIX, Linux and OSX), Windows (NT4, 2000, XP, Vista and 7) for quite a while I have seen a lot of different software tools. I would like to quickly exhibit my “must have” list. These are the packages that I find to be the […]

Estimated reading time: 4 minutes

Readers returning to our blog will know that Win-Vector LLC is fairly “pro-R.” You can take that to mean “in favor or R” or “professionally using R” (both statements are true). Some days we really don’t feel that way.

Estimated reading time: 7 minutes

IowaHawk has a excellent article attempting to reproduce the infamous CRU climate graph using OpenOffice: Fables of the Reconstruction. We thought we would show how to produced similarly bad results using R.

Estimated reading time: 9 minutes

This article is quick concrete example of how to use the techniques from Survive R to lower the steepness of The R Project for Statistical Computing‘s learning curve (so an apology to all readers who are not interested in R). What follows is for people who already use R and […]

Estimated reading time: 12 minutes

New PDF slides version (presented at the Bay Area R Users Meetup October 13, 2009). We at Win-Vector LLC appear to like R a bit more than some of our, perhaps wiser, colleagues ( see: Choose your weapon: Matlab, R or something else? and R and data ). While we […]

Estimated reading time: 5 minutes

What makes a good graph? When faced with a slew of numeric data, graphical visualization can be a more efficient way of getting a feel for the data than going through the rows of a spreadsheet. But do we know if we are getting an accurate or useful picture? How […]

Estimated reading time: 22 minutes

Our first “exciting technique” article is about a statistical language called “R.” R is a language for statistical analysis available from http://cran.r-project.org/ . The things you can immediately do with it are incredible. You can import a spreadsheet and immediately spot relationships, trend and anomalies. R gives you instant access […]

Estimated reading time: 3 minutes