## Rule 42 Software

Continuing (and hopefully ending) our quick series on software pathologies I would like to follow-up The Hyper Dance with “Rule 42 Software.”

Continuing (and hopefully ending) our quick series on software pathologies I would like to follow-up The Hyper Dance with “Rule 42 Software.”

I’d like to share a new talk on bilingual data science. It is limited to R and Python, so it is a bit of a “we play all kinds of music, both Country and Western.” It has what I feel is a really neat example how I used Jetbrains Intellij […]

I’ve now shared the code for my “Variable Utility is not Intrinsic” article here: https://github.com/WinVector/Examples/tree/main/Variable_Utility_is_not_Intrinsic. And I have also ported the entire article to Python. It is actually kind of neat to be able to compare the two and see how close doing data science in R and in Python […]

There is much ado about variable selection or variable utility valuation in supervised machine learning. In this note we will try to disarm some possibly common fallacies, and to set reasonable expectations about how variable valuation can work. Introduction In general variable valuation is estimating the utility that a column […]

We introduce the dual numbers, a number system that is simple to calculate with that allows us to perform automatic differentiation. For more on the dual number please see: https://win-vector.com/tag/dual-numbers/. (link)

A lot of machine learning, statistical, plotting, and analytics algorithms over-sell a small evil trick I call “the hyper dance.”

Introduction Here is a quick data-scientist / data-analyst question: what is the overall trend or shape in the following noisy data? For our specific example: How do we relate value as a noisy function (or relation) of m? This example arose in producing our tutorial “The Nature of Overfitting”. One […]

Introduction I would like to talk about the nature of supervised machine learning and overfitting. One of the cornerstones of our data science intensives is giving the participants the experiences of a data scientist in a safe controlled environment. We hope by working examples they can quickly get to the […]

Why a mere accurate classification rule may not meet your business needs. And why you should insist on a model that returns numeric scores for classification problems. (link)

Let’s please stop saying somebody isn’t a data scientist if they haven’t memorized the innards of one obscure machine learning algorithm, or blow the right smoke during an interoo (“Kangaroo interview”, thanks Jim Ruppert for this term!). Let us, instead, think of the data scientist as the bus driver. It […]