Menu Home

Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

We have updated the errata for Practical Data Science with R to reflect that it is no longer worth the effort to use the Java version of SQLScrewdriver as described.


We are very sorry for any confusion, trouble, or wasted effort bringing in Java software (something we are very familiar with, but forget not everybody uses) has caused readers. Also, database adapters for R have greatly improved, so we feel more confident depending on them alone. Practical Data Science with R remains an excellent book and a good resource to learn from that we are very proud of and fully support (hence errata).When I (John Mount, that section was my fault) wrote that section we had a lot of client interest in ad-hoc procedures in primarily Java environments (pre-Spark), and Java SQLScrewdriver was a simplifying tool (a streaming loader). It turns out it imposes an unnecessary reader burden to use it in our R book, as it brings in the use of Java which is one more thing (beyond R) to configure, which can confuse beginners.

So: we no longer recommend using H2DB, Java, and SquirreL SQL to load data. Instead we recommend using R, readr, read.table, and dbWriteTable using either PostgreSQL (for a lot of data) or RSQlite (for in-memory practice). Nina Zumel has a very nice free write-up of the replacement screwdriver concept here.

Categories: Administrativia Pragmatic Data Science

Tagged as:


Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.