Our first “exciting technique” article is about a statistical language called “R.”
R is a language for statistical analysis available from http://cran.r-project.org/ . The things you can immediately do with it are incredible. You can import a spreadsheet and immediately spot relationships, trend and anomalies. R gives you instant access to top notch visualization methods and sophisticated statistical methods.
R is so hot (a strange thing to say about a statistics package) that it was the subject of a recent New York Times article: http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html . If you read between the lines some of the interviewees come off as being slightly threatened by R (there is a slight hint of “R is very good for others”). In fact R is simply very good. A good statistician with R can do things that a great statistician without R can not. Like all tools R is dangerous, ask for the wrong analysis and you well draw wrong and misleading conclusions. Ask for the right analysis and R will correctly perform it while tracking critical implementation details that would take you hundreds of hours to master on you own.
Want to produce graphs using the theories of perception and analysis of W. S. Cleveland? Simple- use Deepayan Sarkar’s “Lattice” model, which even has a wonderful book.
Want to find subtle relationships in your data using logistic regression (one of the more complicated cousins of linear regression)? That is built into the base R system.
Need to re-run all of your analyses because the data has changed? R is script based and stores your command history. A single paste can re-run a 20 step analysis and re-build a 10 slide presentation.
Impressed by a particular type of analysis? Take, for example, Roger Koenker’s “Quantile Regression” (which is a brilliant idea backed by a masterpiece of a book). Guess what, the original author has supplied a free R-module that implements the ideas.
Want to give a client working software? Easy, R is open source and comes with very good automated installers for OSX, Linux and Windows.
Want to train somebody to use R? Easy, R has an extensive library of excellent books and there is even an exciting set of books with a series title “Use R!”
Want to learn the internals of R from John M. Chambers (one of the inventors of the “S” language that R is an implementation of)? You are in luck the latest book by Chambers is “Software for Data Analysis, Programming with R.” R is so popular that it has managed to pull one of the creators of S language and the proprietary S+ implementation into its world.
It is almost getting to the point where you need to justify not using R.
Categories: Exciting Techniques
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
A relevant comment by Win-Vector Principal Consultant Nina Zumel on our good friend’s blog:
It’s like that joke about boats: the worst thing you can do with them is put them in the water. The worst thing you can do with R is give it data…
We make a point of doing as much of the data massaging and cleaning as we can outside of R (a point John conveniently ignored in his post), usually in Java.
Quick update- I strongly recommend R, but it turns out I can not really recommend John M. Chambers’ book “Software for Data Analysis, Programming with R.”