“Data Science” is obviously a trendy term making it way through the hype cycle. Either nobody is good enough to be a data scientist (unicorns) or everybody is too good to be a data scientist (or the truth is somewhere in the middle).
And there is a quarter that grumbles that we are merely talking about statistics under a new name (see here and here).
It has always been the case that advances in data engineering (such as punch cards, or data centers) make analysis practical at new scales (though I still suspect Map/Reduce was a plot designed to trick engineers into being excited about ETL and report generation).
However, in the 1940s and 1950s the field was called “operations research” (even when performed by statisticians). When you read John F. Magee, (2002) “Operations Research at Arthur D. Little, Inc.: The Early Years”, Operations Research 50(1):149-153 http://dx.doi.org/10.1287/opre.22.214.171.12496 you really come away with the impression you are reading about a study of online advertising performed in the 1940s (okay mail advertising, but mail was “the email of its time”).
In this spirit next week we will write about the sequential analysis solution for A/B-testing, invented in the 1940s by one of the greats of statistics and operations research: Abraham Wald (whom we have written about before).
Categories: Opinion Statistics
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
I agree with Jesper’s answer (http://qr.ae/Rb2LCI) in that it is pretty much applied epistemology. Topics like logic, statistics, decision theory, game theory are usually taught in (advanced) classes in epistemology and philosophy of science.
The Venn diagram stuff is fun, but brining in epistemology in a field that mostly calculates mere correlations really sounds like an attempt at revenge for Mathematician Gian-Carlo Rota’s excursions into philosophy.
The Venn diagram was provided by the questioner and not Jesper. I also agree with him in that “Data Science” is a stupid term. Even if “correlation on steroids” is how people view it in practice, I generally regard the more ideal “applied epistemology” as inference and knowledge extraction (maybe on a computer). The best way to think about it is to understand that there were computers before Apple, Microsoft, IBM, etc (they were people) and there were databases before MySQL (they were paper records, abacuses, etc.). Ideally, it can simply be a convenient way of figuring things out. Hopefully, the reliance on correlations alone will change.