Menu Home

Practical Data Science with R

We are very proud to present our book Practical Data Science with R 2nd Edition. This is the book for you if you are a data scientist, want to be a data scientist, or want to work with data scientists. This is a good “what next” book for analysts and programmers wanting to know more about machine learning and data wrangling.

PDSwR2Manning

Our goal is to present data science from a pragmatic, practice-oriented viewpoint. The book will complement other analytics, statistics, machine learning, data science and R books with the following features:

  • This book teaches you how to work as a data scientist. Learn how important listening, collaboration, honest presentation, and iteration are to what we do.
  • The key emphasis of the book is process: collecting requirements, loading data, examining data, building models, validating models, documenting and deploying models to production.
  • We provide over 10 significant example datasets, and demonstrate the concepts that we discuss with fully worked exercises using standard R methods. We feel that this approach allows us to illustrate what we really want to teach and to demonstrate all the preparatory steps necessary to any real-world project. Every result and almost every graph in the book is given as a fully worked example.
  • This book is careful with statistics, but presents topics in the context and order a practitioner worries about them. For example we emphasize construction of predictive models and model evaluation and prediction over the more standard topics of summary statistics and packaged procedures (such as ANOVA).

In support of Practical Data Science with R 2nd Edition we are providing:

From the Foreword

Practical Data Science with R, Second Edition, is a hands-on guide to data science, with a focus on techniques for working with structured or tabular data, using the R language and statistical packages. The book emphasizes machine learning, but is unique in the number of chapters it devotes to topics such as the role of the data scientist in projects, managing results, and even designing presentations. In addition to working out how to code up models, the book shares how to collaborate with diverse teams, how to translate business goals into metrics, and how to organize work and reports. If you want to learn how to use R to work as a data scientist, get this book.

We have known Nina Zumel and John Mount for a number of years. We have invited them to teach with us at Singularity University. They are two of the best data scientists we know. We regularly recommend their original research on cross-validation and impact coding (also called target encoding). In fact, chapter 8 of Practical Data Science with R teaches the theory of impact coding and uses it through the authors own R package: vtreat.

Practical Data Science with R takes the time to describe what data science is, and how a data scientist solves problems and explains their work. It includes careful descriptions of classic supervised learning methods, such as linear and logistic regression. We liked the survey style of the book and extensively worked examples using contest-winning methodologies and packages such as random forests and xgboost. The book is full of useful, shared experience and practical advice. We notice they even include our own trick of using random forest variable importance for initial variable screening.

Overall, this is a great book, and we highly recommend it.

Jeremy Howard and Rachel Thomas

About the foreword authors.

Jeremy Howard is an entrepreneur, business strategist, developer, and educator. Jeremy is a founding researcher at fast.ai, a research institute dedicated to making deep learning more accessible. He is also a faculty member at the University of San Francisco, and is chief scientist at doc.ai and platform.ai.

Previously, Jeremy was the founding CEO of Enlitic, which was the first company to apply deep learning to medicine, and was selected as one of the worlds top 50 smartest companies by MIT Tech Review two years running. He was the president and chief scientist of the data science platform Kaggle, where he was the top-ranked participant in international machine learning competitions two years running.

Rachel Thomas is director of the USF Center for Applied Data Ethics and cofounder of fast.ai, which has been featured in The Economist, MIT Tech Review, and Forbes. She was selected by Forbes as one of 20 Incredible Women in AI, earned her math PhD at Duke, and was an early engineer at Uber. Rachel is a popular writer and keynote speaker. In her TEDx talk, she shares what scares her about AI and why we need people from all backgrounds involved with AI.

Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning, 2019 is available from:

7 replies

  1. I purchased this book I am looking for pdf format of this book too so that I can understand code.

    1. Manning gives you a URL to get the PDF version of the book as part of the front-matter of the book or part of the emailed instructions if you purchased an e-copy. In all cases all of the code is shared here in the sub-directory named CodeExamples (the first edition code can be found here).

  2. Can you please summarize how the 2nd ed. differs from the 1st ed.? Thanks for all you do.

    P.S. Like your DataCamp courses!

    1. Thanks for you kind words.

      2nd edition differs quite a lot: new chapter on data engineering (data.table, dplyr, and base-R), completely re-written chapters on fitting, new section on model explainability with LIME, new chapter on advanced data prep (using vtreat!). Also everything is updated and re-tested for the current state of the R ecosystem.

Leave a Reply to John Mount Cancel reply