## Data Science Bite: What is Statistics?

Statistics is the science of relating summaries of observable samples to the unobserved summaries of the populations they are drawn from. I try to explain that with an example in this video. (link)

Statistics is the science of relating summaries of observable samples to the unobserved summaries of the populations they are drawn from. I try to explain that with an example in this video. (link)

I felt a bit guilty explaining a Kelly/Thorp style card betting system without discussing why these ideas don’t work on fair coin games. So I have “writeup for engineers” on the martingale theory of such games. This has example code, so one could try to come up with a betting […]

I demonstrate a Kelly/Thorp betting system for the simple card game of guessing if the next card from a standard deck is red or black. I have a video of the play here. And a derivation of the betting strategy in R is here. A derivation of the proof you […]

Introduction Teaching basic data science, machine learning, and statistics is great due to the questions. Students ask brilliant questions, as they see what holes are present in your presentation and scaffolding. The students are not yet conditioned to ask only what you feel is easy to answer or present. They […]

I’d like to share a new talk on bilingual data science. It is limited to R and Python, so it is a bit of a “we play all kinds of music, both Country and Western.” It has what I feel is a really neat example how I used Jetbrains Intellij […]

I’ve now shared the code for my “Variable Utility is not Intrinsic” article here: https://github.com/WinVector/Examples/tree/main/Variable_Utility_is_not_Intrinsic. And I have also ported the entire article to Python. It is actually kind of neat to be able to compare the two and see how close doing data science in R and in Python […]

Introduction Here is a quick data-scientist / data-analyst question: what is the overall trend or shape in the following noisy data? For our specific example: How do we relate value as a noisy function (or relation) of m? This example arose in producing our tutorial “The Nature of Overfitting”. One […]

Introduction I would like to talk about the nature of supervised machine learning and overfitting. One of the cornerstones of our data science intensives is giving the participants the experiences of a data scientist in a safe controlled environment. We hope by working examples they can quickly get to the […]

I am sharing a new free video where I work through a great common argument that bounds expected excess generalization error as a ratio of model complexity (in rows) over training set size (again in rows), independent of problem dimension. (link) For more of my notes on support vector machines […]

Our book, Practical Data Science with R, just had its first year anniversary! The book is doing great, if you are working with R and data I recommend you check it out. (link)