Many data scientists (and even statisticians) often suffer under one of the following misapprehensions: They believe a technique doesn’t work in their current situation (when in fact it does), leading to useless precautions and missed opportunities. They believe a technique does work in their current situation (when in fact it […]
Estimated reading time: 4 minutes
I am pleased to announce that vtreat version 0.6.0 is now available to R users on CRAN. vtreat is an excellent way to prepare data for machine learning, statistical inference, and predictive analytic projects. If you are an R user we strongly suggest you incorporate vtreat into your projects.
Estimated reading time: 2 minutes
One thing I teach is: when evaluating the performance of regression models you should not use correlation as your score. This is because correlation tells you if a re-scaling of your result is useful, but you want to know if the result in your hand is in fact useful. For […]
Estimated reading time: 9 minutes
Nina Zumel and I are happy to announce a formal article discussing data preparation and cleaning using the vtreat methodology is now available from arXiv.org as citation arXiv:1611.09477 [stat.AP]. vtreat is an R data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. It prepares variables […]
Estimated reading time: 2 minutes
It has been popular to complain that the current terms “data science” and “big data” are so vague as to be meaningless. While these terms are quite high on the hype-cycle, even the American Statistical Association was forced to admit that data science is actually a real thing and exists. […]
Estimated reading time: 3 minutes
The Facebook data science blog shared some fun data explorations this Valentine’s Day in Carlos Greg Diuk’s “The Formation of Love”. They are rightly receiving positive interest in and positive reviews of their work (for example Robinson Meyer’s Atlantic article). The finding is also a great opportunity to discuss the […]
Estimated reading time: 13 minutes