Back to teaching. For a few years we’ve been running a data science intensive at for a really neat FAAMG company. The idea is to give engineers some hands on live workbook time using methods varying from linear regression, xgboost, to deep neural networks. Learning how participants progress and internalize […]

Estimated reading time: 1 minute

I’d like to address how and why I am making the recent light-board video lectures (please check them out: A/B testing and Simpson’s Paradox, and Bayes’s Law and Odds). How How is the easy part. There are a number of tutorials on how to do this. The one I found […]

Estimated reading time: 5 minutes

Authors: John Mount and Nina Zumel Introduction In teaching thinking in terms of coordinatized data we find the hardest operations to teach are joins and pivot. One thing we commented on is that moving data values into columns, or into a “thin” or entity/attribute/value form (often called “un-pivoting”, “stacking”, “melting” […]

Estimated reading time: 11 minutes

I want to discuss a nice series of figures used to teach relational join semantics in R for Data Science by Garrett Grolemund and Hadley Wickham, O’Reilly 2016. Below is an example from their book illustrating an inner join: Please read on for my discussion of this diagram and teaching […]

Estimated reading time: 3 minutes

Authors: John Mount and Nina Zumel. Introduction It has been our experience when teaching the data wrangling part of data science that students often have difficulty understanding the conversion to and from row-oriented and column-oriented data formats (what is commonly called pivoting and un-pivoting). Boris Artzybasheff illustration Real trust and […]

Estimated reading time: 30 minutes

When we teach “R for statistics” to groups of scientists (who tend to be quite well informed in statistics, and just need a bit of help with R) we take the time to re-work some tests of model quality with the appropriate significance tests. We organize the lesson in terms […]

Estimated reading time: 5 minutes

We recently got this question from a subscriber to our book: … will you in any way describe what subject areas, backgrounds, courses etc. would help a non data scientist prepare themselves to at least understand at a deeper level why they techniques you will discuss work…and also understand the […]

Estimated reading time: 2 minutes

This was originally posted at ninazumel.com. I’m re-blogging it here. Photo: John Mount I came across a post from Emily Willingham the other day: “Is a PhD required for Good Science Writing?”. As a science writer with a science PhD, her answer is: is it not required, and it can […]

Estimated reading time: 14 minutes