Nina Zumel and I ( John Mount ) have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has now entered Manning Early Access Program (MEAP) which allows you to subscribe to chapters as they become available and give us feedback before the book goes into print.
Please subscribe to our book, your support now will help us improve it. Please also forward this offer to your friends and colleagues (and please ask them to also subscribe and forward).
Manning is sharing a 50% off promotion code active until May 18, 2013: pdswrco .
Deal of the Day May 21 2013: Half off Practical Data Science with R. Use code dotd0521au.
Please subscribe to our MEAP!
Categories: Administrativia Exciting Techniques
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Is there any additional information about the book contents? Perhaps an extended TOC or chapter descriptions? It looks good, but prior to the purchase it would be nice to get more details than mere chapter titles. Thank you.
@Maxim A fair question. We will definitely get more info out throughout the year and there will be more chances to get the book at a discount (for example we have been told we will be Manning “Deal of the Day” on May 21, 2013). So you should not have to buy until you feel it is something that would be right for you. The book will also undergo an number of beneficial revisions and much editing before the final copy is released (all MEAP subscribers get a final e-copy when it is a available). Our current planning (SUBJECT TO CHANGE!) chapters to section 1 outline is as follows (but we will share more both on the Manning site and here throughout the year) is:
We have a lot more material prepared than is currently being shared in the MEAP, but it will change as we refine and edit it.
A few of the things on my wishlist for a “data science with R” book include:
1) Collecting and parsing unstructured data: How to programmatically pull in unstructured information from a range of sources, parse it, clean it, and shape it into an analysis-ready format. This includes getting data from sites with APIs, scraping data from web pages, programmatically “filling in” query forms on web pages and pulling the resulting output data into R, extracting and parsing text from PDF files, etc. These types of tasks seem to get short shrift, if they’re even touched on at all, in books about R.
2) Complex data munging: I’ve got basic reshaping, aggregating, and summarizing down. But I keep running into situations with multiple complexities like repeated measures, multiple data sources, lots of variables, non-uniform time increments between measurements, censoring, etc., where I end up resorting to inefficient, kludgy code to get my data into the form I need. Whatever the next level looks like in terms of data munging with R, I’d like to find a way to get there.
3) Moving more cleanly and easily from R output to nice looking documents and presentations: knitr and markdown seem to have made it easier to create decent-looking reports. And I’ve seen some cool HTML5 presentations with R output recently. Unfortunately, I still need to use MS-Word and Powerpoint in my organization. I’d love to figure out a relatively automated workflow to go from R code to a Word or Powerpoint document. Or, at the least, to turn summary tables produced with R into decent-looking tables in Word and Powerpoint without a lot of by-hand copying, pasting, and format fiddling.
I can’t quite tell from your TOC whether you’re planning to cover these things in your book, but I’m keeping my fingers crossed and looking forward to the final product!
@jms
Good list. Honest answers: we are not going to work with unstructured data in this book, we will likely get into some interesting data munging at the reshaping level but not too much further, and we will spend a lot of time on presentations- but targeted to Powerpoint and Keynote (not directly all from R).
We have added a permanent page about this book: http://www.win-vector.com/blog/practical-data-science-with-r/
Do you need any reviewers? I think I’m part of your target audience -> Former database guy, now researcher with decent data exploration instincts, but need much broader methods (and how-to’s).
@Rick
Sounds like a good background. We don’t directly control reviewers of our book, but Manning has an interesting reviewer program here: http://www.manning.com/about/reviewer.html (I had seen it once, but I can’t find links to it from their main page). So you may want to email them instead of using the Google Docs link (just in case they are not checking it at the moment).