To make getting started with rquery
(an advanced query generator for R
) easier we have re-worked the package README
for various data-sources (including SparkR
!).
Here are our current examples:
rquery
andMonetDBLite
rquery
andRPostgreSQL
rquery
andRPostgres
rquery
andRSQLite
rquery
andSparkR
rquery
andsparklyr
For the MonetDBLite
the query diagrammer shows a repeated calculation that we decided was best to leave in.
And the RSQLite
diagram shows the consequences of replacing window functions with joins.
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
RPostgreSQL has been updated last time almost two years ago. Any particular reason not to use RPostgres?
I guess I’ll have to start this answer with the weak apology: “well you asked.” (Actually this is both “asked a lot verbally” and in the “nobody says that” categories. So, apologies taking the chance to discuss this in writing here.)
About
RPostgreSQL
versusRPostgres
.When I last tested with
RPostgres
it was a very bad experience including returningint64
for row counts (not a baseR
type, but a class from thebit64
package; broke a lot of code), didn’t work withRedshift
(a service I was using a lot at the time), and had a lot of other issues. I’ve patched around what I could, and probably will look into adding aRPostgres
example. But for production work I tryRPostgreSQL
first.And frankly given the contributors I worry
RPostgres
will be abandoned for a proprietaryRStudio Professional Driver
, or that the driver will become “tidyverse
only” and return non base-R
data types (breaking other code, forcing those packages on all, and making it more difficult to interoperate with other extension packages such asdata.table
).Also I found
RPostgreSQL
to be excellent. It had a couple of variations fromDBI
, butrquery
has an adapter layer to work through such things.Finally I am not sure if you are serious with the “updated last time almost two years ago” point.
RPostgres
itself was last updated over 6 months ago, and only has just over a year of public experience and tuning, compared toRPostgreSQL
having just over 10 years of public experience and tuning. One could ask a similar question: “why usemagrittr
, given it hasn’t had an update in over 4 years?” (there are other options, including our ownwrapr
).When I said that RPostgreSQL hasn’t been updated in almost two years, I meant latest CRAN release which is dated 2017-06-24. However, I concede, I’ve been a bit unfair since I was comparing latest GitHub activity of RPostgres with CRAN release date of RPostgreSQL. Those are not the same, obviously. But I think we both would agree that RPostgres is a bit more maintained.
I have to say, I was expecting your response to be pretty much what you wrote and I wouldn’t have asked if I didn’t see sparklyr in the list of backends rquery supports.
Thanks for both your comments and your patience. You have some good points. I’ve got the
RPostgres
example up now (it is just a driver swap, as I had already patched around its issues).I actually see the difference between
sparklyr
andSparkR
as a bit wider (and favoringsparklyr
, gettingSparkR
to work was not a simple matter of changing drivers- but required much more at the adapter layer).Do you have much experience (first- or second-hand) of using
rquery
with SQL Server? I tried it as I wanted to test outrqdatatable
at work, but got errors when trying the setup stages of therquery
introduction. I notice that SQL Server isn’t on that list: is rquery not compatible with it? Sorry if this isn’t the right place to ask.SQL Server isn’t tested/supported as we don’t have a copy to test or develop against. Messages during the connection tests are normal, as that is where the package test the edges of the database.
rqdatatable
is thedata.table
realization ofrquery
, so it does not use a database connection.