I was flipping through my copy of William Cleveland’s The Elements of Graphing Data the other day; it’s a book worth revisiting. I’ve always liked Cleveland’s approach to visualization as statistical analysis. His quest to ground visualization principles in the context of human visual cognition (he called it “graphical perception”) […]
Estimated reading time: 17 minutes
Model level fit summaries can be tricky in R. A quick read of model fit summary data for factor levels can be misleading. We describe the issue and demonstrate techniques for dealing with them.
Estimated reading time: 12 minutes
A big congratulations to Win-Vector LLC‘s Dr. Nina Zumel for authoring and teaching portions of EMC‘s new Data Science and Big Data Analytics training and certification program. A big congratulations to EMC, EMC Education Services and Greenplum for creating a great training course. Finally a huge thank you to EMC, […]
Estimated reading time: 32 seconds
How is it even possible to set expectations and launch data science projects? Data science projects vary from “executive dashboards” through “automate what my analysts are already doing well” to “here is some data, we would like some magic.” That is you may be called to produce visualizations, analytics, data […]
Estimated reading time: 19 minutes
This is a tutorial on how to try out a new package in R. The summary is: expect errors, search out errors and don’t start with the built in examples or real data. Suppose you want to try out a novel statistical technique? A good fraction of the time R […]
Estimated reading time: 14 minutes
One of the current best tools in the machine learning toolbox is the 1930s statistical technique called logistic regression. We explain how to add professional quality logistic regression to your analytic repertoire and describe a bit beyond that.
Estimated reading time: 24 minutes
Recently, we had a client come to us with (among other things) the following question: Who is more valuable, Customer Type A, or Customer Type B? This client already tracked the net profit and loss generated by every customer who used his services, and had begun to analyze his customers […]
Estimated reading time: 20 minutes
In the previous installment of the Statistics to English Translation, we discussed the technical meaning of the term ”significant”. In this installment, we look at how significance is calculated. This article will be a little more technically detailed than the last one, but our primary goal is still to help […]
Estimated reading time: 21 minutes
In this installment of our ongoing Statistics to English Translation series1, we will look at the technical meaning of the term ”significant”. As you might expect, what it means in statistics is not exactly what it means in everyday language. As always, a pdf version of this article is available […]
Estimated reading time: 22 minutes
Scientists, engineers, and statisticians share similar concerns about evaluating the accuracy of their results, but they don’t always talk about it in the same language. This can lead to misunderstandings when reading across disciplines, and the problem is exacerbated when technical work is communicated to and by the popular media. […]
Estimated reading time: 30 minutes