Menu Home

Practical Data Science with R update

Just got the following note from a new reader:

Thank you for writing Practical Data Science with R. It’s challenging for me, but I am learning a lot by following your steps and entering the commands.

Wow, this is exactly what Nina Zumel and I hoped for. We wish we could make everything easy, but an appropriate amount of challenge is required for significant learning and accomplishment.

Of course we try to avoid inessential problems. All of the code examples from the book can be found here (and all the data sets here).

The second edition is coming out very soon. Please check it out.

Categories: Administrativia Opinion

Tagged as:


Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

2 replies

  1. Regarding the terminology for pdfs and cdfs in R that you complain about. I learned my stats in the 1980s and in those days there was the probability density function (PDF) and its integral, the distribution function (df).

    At some point in the late 80s or early 90s people started referring to the df as the cumulative distribution function (CDF). This had two advantages

    a) if you were unsure about which was the density function and which was the distribution function (I wasn’t, but a lot of engineers I worked with were always mixing them up) the word CUMULATIVE was a big hint.

    b) the abbreviation df was also used for degree of freedom, so it cleared that up.

    I have never heard anyone refer to a probability density function as a probability distribution function. I agree that R’s use of ‘p’ for the CDF isn’t very helpful but ‘d’ for density seems easy to remember.

    The books I learned stats theory from were An Introduction to Mathematical Statistics, by HD Brunk and Introduction to the theory of statistics, by Mood, Graybill and Boes. They use the above terminology.

    1. Thanks for your note.

      What is easy to remember and preferred naming can be a bit path dependent, depending on where one was first trained and if they mostly work with continuous or discrete distributions.