Menu Home

Getting Started With rquery

To make getting started with rquery (an advanced query generator for R) easier we have re-worked the package README for various data-sources (including SparkR!).

Here are our current examples:

For the MonetDBLite the query diagrammer shows a repeated calculation that we decided was best to leave in.

NewImage

And the RSQLite diagram shows the consequences of replacing window functions with joins.

NewImage

Categories: Coding Tutorials

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

6 replies

  1. RPostgreSQL has been updated last time almost two years ago. Any particular reason not to use RPostgres?

    1. I guess I’ll have to start this answer with the weak apology: “well you asked.” (Actually this is both “asked a lot verbally” and in the “nobody says that” categories. So, apologies taking the chance to discuss this in writing here.)

      About RPostgreSQL versus RPostgres.

      When I last tested with RPostgres it was a very bad experience including returning int64 for row counts (not a base R type, but a class from the bit64 package; broke a lot of code), didn’t work with Redshift (a service I was using a lot at the time), and had a lot of other issues. I’ve patched around what I could, and probably will look into adding a RPostgres example. But for production work I try RPostgreSQL first.

      And frankly given the contributors I worry RPostgres will be abandoned for a proprietary RStudio Professional Driver, or that the driver will become “tidyverse only” and return non base-R data types (breaking other code, forcing those packages on all, and making it more difficult to interoperate with other extension packages such as data.table).

      Also I found RPostgreSQL to be excellent. It had a couple of variations from DBI, but rquery has an adapter layer to work through such things.

      Finally I am not sure if you are serious with the “updated last time almost two years ago” point. RPostgres itself was last updated over 6 months ago, and only has just over a year of public experience and tuning, compared to RPostgreSQL having just over 10 years of public experience and tuning. One could ask a similar question: “why use magrittr, given it hasn’t had an update in over 4 years?” (there are other options, including our own wrapr).

  2. When I said that RPostgreSQL hasn’t been updated in almost two years, I meant latest CRAN release which is dated 2017-06-24. However, I concede, I’ve been a bit unfair since I was comparing latest GitHub activity of RPostgres with CRAN release date of RPostgreSQL. Those are not the same, obviously. But I think we both would agree that RPostgres is a bit more maintained.

    I have to say, I was expecting your response to be pretty much what you wrote and I wouldn’t have asked if I didn’t see sparklyr in the list of backends rquery supports.

    1. Thanks for both your comments and your patience. You have some good points. I’ve got the RPostgres example up now (it is just a driver swap, as I had already patched around its issues).

      I actually see the difference between sparklyr and SparkR as a bit wider (and favoring sparklyr, getting SparkR to work was not a simple matter of changing drivers- but required much more at the adapter layer).

  3. Do you have much experience (first- or second-hand) of using rquery with SQL Server? I tried it as I wanted to test out rqdatatable at work, but got errors when trying the setup stages of the rquery introduction. I notice that SQL Server isn’t on that list: is rquery not compatible with it? Sorry if this isn’t the right place to ask.

    1. SQL Server isn’t tested/supported as we don’t have a copy to test or develop against. Messages during the connection tests are normal, as that is where the package test the edges of the database. rqdatatable is the data.table realization of rquery, so it does not use a database connection.

%d bloggers like this: