Menu Home

Big News! “Practical Data Science with R” MEAP launched!

Nina Zumel and I ( John Mount ) have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has now entered Manning Early Access Program (MEAP) which allows you to subscribe to chapters as they become available and give us feedback before the book goes into print.


Zumel PDSwithR 3

Please subscribe to our book, your support now will help us improve it. Please also forward this offer to your friends and colleagues (and please ask them to also subscribe and forward).

Manning is sharing a 50% off promotion code active until May 18, 2013: pdswrco .

Deal of the Day May 21 2013: Half off Practical Data Science with R. Use code dotd0521au.

Please subscribe to our MEAP!

Categories: Administrativia Exciting Techniques

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

7 replies

  1. Is there any additional information about the book contents? Perhaps an extended TOC or chapter descriptions? It looks good, but prior to the purchase it would be nice to get more details than mere chapter titles. Thank you.

  2. @Maxim A fair question. We will definitely get more info out throughout the year and there will be more chances to get the book at a discount (for example we have been told we will be Manning “Deal of the Day” on May 21, 2013). So you should not have to buy until you feel it is something that would be right for you. The book will also undergo an number of beneficial revisions and much editing before the final copy is released (all MEAP subscribers get a final e-copy when it is a available). Our current planning (SUBJECT TO CHANGE!) chapters to section 1 outline is as follows (but we will share more both on the Manning site and here throughout the year) is:

    Practical Data Science with R
    		What is Data Science?
    		Why this book?
    		Brief Table of Contents
    Part 1: Introduction to Data Science
    	Chapter 1: The Data Science Process
    		The Roles in a Data Science Project
    		The Stages of a Data Science Project
    		Setting Expectations
    	Chapter 2: Starting With R And Data
    		Working with data from files
    		Working with relational databases
    	Chapter 3: Exploring Data
    		Using Summary Statistics to Spot Problems
    		Spotting Problems Using Graphics and Visualization
    	Chapter 4: Managing Data
    		Cleaning Data
    		Sampling for Modeling and Validation
    Part 2: Modeling Methods
    	Chapter 5: Using Memorization Methods
    		Using lookup tables
    		Using decision trees
    		Using naive Bayes
    	Chapter 6: Learning Functional Models
    		Introducing Functional Models
    		Using Linear regression
    		Using Logistic regression
    		Using Generalized Additive Models and Spline
    		Using Neural Nets
    	Chapter 7: Using Unsupervised Methods
    		Using Clustering Methods
    		Using Nearest Neighbor Methods
    		Using Association Rules
    	Chapter 8: Exploring Advanced Methods
    		Using Kernel Methods
    		Using Support Vector Machiens
    		Using Ensemble Methods
    		Using Random Forests
    Part 3: Results
    	Chapter 9: Evaluating Models
    		Model Evaluation
    		Over fitting
    	Chapter 10: Managing Models in Production
    		Exporting models
    		Depolying services
    		Mitigating Risk
    		Business rules and polish
    	Chapter 11: Preparing Successful Business-oriented Presentations
    		Presenting to different audiences
    		A successful presentation for the project sponsor
    		A successful presentation for the end user
    	Chapter 12: Preparing Successful Technical Presentations
    		A successful presentation for other data scientists
    		Good Documentation for the deployment team
    	Chapter 13: Depoloyment Documentation
    	Chapter 14: Conclusions, what to take away
    Appendices
    	Appendix A: Working With R And Other Tools
    		Acquiring a complete data scientist's workbench
    		Programming in R
    		Starting with RStudio, Git and SQL
    	Appendix B: Important Statistical Concepts
    		Signficance
    		Bias Variance decomopsition
    	Appendix C: Translating problems and data to techniques
    		Mapping problems to machine learning tasks
    		Mapping data to methods
    		Conclusion
    	Appendix D: Further Reading
    

    We have a lot more material prepared than is currently being shared in the MEAP, but it will change as we refine and edit it.

  3. A few of the things on my wishlist for a “data science with R” book include:

    1) Collecting and parsing unstructured data: How to programmatically pull in unstructured information from a range of sources, parse it, clean it, and shape it into an analysis-ready format. This includes getting data from sites with APIs, scraping data from web pages, programmatically “filling in” query forms on web pages and pulling the resulting output data into R, extracting and parsing text from PDF files, etc. These types of tasks seem to get short shrift, if they’re even touched on at all, in books about R.

    2) Complex data munging: I’ve got basic reshaping, aggregating, and summarizing down. But I keep running into situations with multiple complexities like repeated measures, multiple data sources, lots of variables, non-uniform time increments between measurements, censoring, etc., where I end up resorting to inefficient, kludgy code to get my data into the form I need. Whatever the next level looks like in terms of data munging with R, I’d like to find a way to get there.

    3) Moving more cleanly and easily from R output to nice looking documents and presentations: knitr and markdown seem to have made it easier to create decent-looking reports. And I’ve seen some cool HTML5 presentations with R output recently. Unfortunately, I still need to use MS-Word and Powerpoint in my organization. I’d love to figure out a relatively automated workflow to go from R code to a Word or Powerpoint document. Or, at the least, to turn summary tables produced with R into decent-looking tables in Word and Powerpoint without a lot of by-hand copying, pasting, and format fiddling.

    I can’t quite tell from your TOC whether you’re planning to cover these things in your book, but I’m keeping my fingers crossed and looking forward to the final product!

  4. @jms

    Good list. Honest answers: we are not going to work with unstructured data in this book, we will likely get into some interesting data munging at the reshaping level but not too much further, and we will spend a lot of time on presentations- but targeted to Powerpoint and Keynote (not directly all from R).

  5. Do you need any reviewers? I think I’m part of your target audience -> Former database guy, now researcher with decent data exploration instincts, but need much broader methods (and how-to’s).

  6. @Rick
    Sounds like a good background. We don’t directly control reviewers of our book, but Manning has an interesting reviewer program here: http://www.manning.com/about/reviewer.html (I had seen it once, but I can’t find links to it from their main page). So you may want to email them instead of using the Google Docs link (just in case they are not checking it at the moment).

%d bloggers like this: