## Against Accuracy

Why a mere accurate classification rule may not meet your business needs. And why you should insist on a model that returns numeric scores for classification problems. (link)

Why a mere accurate classification rule may not meet your business needs. And why you should insist on a model that returns numeric scores for classification problems. (link)

Let’s please stop saying somebody isn’t a data scientist if they haven’t memorized the innards of one obscure machine learning algorithm, or blow the right smoke during an interoo (“Kangaroo interview”, thanks Jim Ruppert for this term!). Let us, instead, think of the data scientist as the bus driver. It […]

I am sharing some rough notes (in R and Python) here on how while dot(a, b) fulfills “Mercer’s condition” (by definition!, and I’ll just informally call these beasts a “Mercer Kernel”), the seemingly harmless variations abs(dot(a, b)) relu(dot(a, b)) are not Mercer Kernels (relu(x) = max(0, x) = (abs(x) + […]

I am sharing a new free video where I work through a great common argument that bounds expected excess generalization error as a ratio of model complexity (in rows) over training set size (again in rows), independent of problem dimension. (link) For more of my notes on support vector machines […]

(link)

What Every Data Scientist Should Know About Floating Point (link)

I have a new math chalk talk to share: “The Real Numbers.” Here I go into some of the terrifying true nature of our common model for continuous quantities. (link)

(link)

In addition to adding a base-R pipe it appears a new base-R function builders is in the works (in addition to “function”). R is a very versatile language, with a great ability to accept user-level or package extensions. What I mean by this is, user code and package code (which […]

R‘s upcoming pipe appears to be currently proposed as a syntactic transform of the form: a |> f(…) -> f(a, …) a |> f() -> f(a) There is a current active discussion on this prototype and some interesting points come up. Note the current proposal appears to disallow a |> […]