Introduction Teaching basic data science, machine learning, and statistics is great due to the questions. Students ask brilliant questions, as they see what holes are present in your presentation and scaffolding. The students are not yet conditioned to ask only what you feel is easy to answer or present. They […]
Estimated reading time: 23 minutes
I’d like to share a new talk on bilingual data science. It is limited to R and Python, so it is a bit of a “we play all kinds of music, both Country and Western.” It has what I feel is a really neat example how I used Jetbrains Intellij […]
Estimated reading time: 51 seconds
There is much ado about variable selection or variable utility valuation in supervised machine learning. In this note we will try to disarm some possibly common fallacies, and to set reasonable expectations about how variable valuation can work. Introduction In general variable valuation is estimating the utility that a column […]
Estimated reading time: 14 minutes
Introduction Here is a quick data-scientist / data-analyst question: what is the overall trend or shape in the following noisy data? For our specific example: How do we relate value as a noisy function (or relation) of m? This example arose in producing our tutorial “The Nature of Overfitting”. One […]
Estimated reading time: 12 minutes
Introduction I would like to talk about the nature of supervised machine learning and overfitting. One of the cornerstones of our data science intensives is giving the participants the experiences of a data scientist in a safe controlled environment. We hope by working examples they can quickly get to the […]
Estimated reading time: 33 minutes
I am sharing some rough notes (in R and Python) here on how while dot(a, b) fulfills “Mercer’s condition” (by definition!, and I’ll just informally call these beasts a “Mercer Kernel”), the seemingly harmless variations abs(dot(a, b)) relu(dot(a, b)) are not Mercer Kernels (relu(x) = max(0, x) = (abs(x) + […]
Estimated reading time: 2 minutes
It looks like R is getting an official pipe operator (ref). R doesn’t work under an RFC process, so we hear about these things and they are discussed on the R-devel mailing list. I’ve written on this topic before (ref), and I have taped some new comments. This sort of […]
Estimated reading time: 1 minute
Our book, Practical Data Science with R, just had its first year anniversary! The book is doing great, if you are working with R and data I recommend you check it out. (link)
Estimated reading time: 22 seconds