We have an exciting new article to share: Don’t Feel Guilty About Selecting Variables.
If you are at all interested in the probabilistic justification of important data science techniques, such as variable selection or pruning, this should be an informative and fun read.
“Data Science” is often criticized with the common slur “if it has science in the name it isn’t a science.” Data science is in fact a science for the following reason: it has empirical content. That is, there are methods that are used because we can confirm they work.
However, data science when done well also has a mathematical basis. We expect to find good mathematical, probabilistic, or statistical justification for reliable procedures.
Variable pruning or selection is one such procedure. It is well known that it can in fact improve data science results. It is an empirical fact or experience: for some datasets, for some fitting procedures explicit prior variable selection improves results. Our new note examines how this is not a mere empirical alchemy, but something that is mathematically justified and to be expected (under an appropriate Bayesian formulation of model fitting).
So please read on and also share: Don’t Feel Guilty About Selecting Variables, or How I Learned to Stop Worrying and Love Variable Selection.
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.