I just got back from a workshop meeting called Digital Transformation of Decision Analysis. This was a workshop organized by Eyas Raddad, David Matheson, and John-Mark Agosta. It was sponsored by The Society of Decision Professionals and Microsoft. Microsoft generously hosted at their new Experience Center at the Microsoft Silicon […]
I have up what I think is a really neat tutorial on how to plot multiple curves on a graph in Python, using seaborn and data_algebra. It is great way to show some data shaping theory convenience functions we have developed. Please check it out.
I’ve now shared the code for my “Variable Utility is not Intrinsic” article here: https://github.com/WinVector/Examples/tree/main/Variable_Utility_is_not_Intrinsic. And I have also ported the entire article to Python. It is actually kind of neat to be able to compare the two and see how close doing data science in R and in Python […]
Let’s please stop saying somebody isn’t a data scientist if they haven’t memorized the innards of one obscure machine learning algorithm, or blow the right smoke during an interoo (“Kangaroo interview”, thanks Jim Ruppert for this term!). Let us, instead, think of the data scientist as the bus driver. It […]
I am sharing a new free video where I work through a great common argument that bounds expected excess generalization error as a ratio of model complexity (in rows) over training set size (again in rows), independent of problem dimension. (link) For more of my notes on support vector machines […]
I have a new short video lecture to share: “Classification as Censored Regression.”
I recently shared a bit of the history of The Science of Data Analysis. I thought I would follow that up with a quick chalk talk titled “What is Statistics?” (link)
I am re-reading from the great statistician John W. Tukey’s paper: Tukey, John W. “The Future of Data Analysis.” Ann. Math. Statist. 33 (1962), no. 1, pp. 1–67. doi:10.1214/aoms/1177704711. https://projecteuclid.org/euclid.aoms/1177704711 I’ve taken the liberty of pulling out some quotes that are very relevant to the usual “data science is not […]
I am working on a promising new series of notes: common data science fallacies and pitfalls. (Probably still looking for a good name for the series!) I thought I would share a few thoughts on it, and hopefully not jinx it too badly.
From the frontmatter: We recommend this book! Deep Learning for Coders with fastai and PyTorch uses advanced frameworks to move quickly through concrete, real-world artificial intelligence or automation tasks. This leaves time to cover usually neglected topics, like safely taking models to production and a much-needed chapter on data ethics. […]