Take care if trying the new RPostgres
database connection package. By default it returns some non-standard types that code developed against other database drivers may not expect, and may not be ready to defend against.
Danger, Will Robinson!
Trying the new package
One can try the newer RPostgres
as a drop-in replacement for the usual RPostgreSQL
.
That starts out okay. We can connect to the database and and pull a summary about remote data to R
.
db <- DBI::dbConnect(
RPostgres::Postgres(),
host = 'localhost',
port = 5432,
user = 'johnmount',
password = '')
## Warning: multiple methods tables found for 'dbQuoteLiteral'
d <- DBI::dbGetQuery(
db,
"SELECT COUNT(1) FROM pg_catalog.pg_tables")
print(d)
## count
## 1 177
ntables <- d$count[[1]]
print(ntables)
## integer64
## [1] 177
The result at first looks okay.
class(ntables)
## [1] "integer64"
typeof(ntables)
## [1] "double"
ntables + 1L
## integer64
## [1] 178
ntables + 1
## integer64
## [1] 178
is.numeric(ntables)
## [1] TRUE
But it is only okay, until it is not.
pmax(1L, ntables)
## [1] 8.744962e-322
pmin(1L, ntables)
## [1] 1
ifelse(TRUE, ntables, ntables)
## [1] 8.744962e-322
for(ni in ntables) {
print(ni)
}
## [1] 8.744962e-322
unclass(ntables)
## [1] 8.744962e-322
If your code, or any package code you are using, perform any of the above calculations, your results will be corrupt and wrong. It is quite likely any code written before December 2017 (RPostgres
‘s first CRAN
distribution) would not have been written with the RPostgres
"integer64
for all of my friends" design decision in mind.
Also note, RPostgres
does not currently appear to write integer64
back to the database.
DBI::dbWriteTable(db, "d", d,
temporary = TRUE,
overwrite = TRUE)
DBI::dbGetQuery(db, "
SELECT
column_name,
data_type,
numeric_precision,
numeric_precision_radix,
udt_name
FROM
information_schema.columns
WHERE
table_name = 'd'
")
## column_name data_type numeric_precision numeric_precision_radix udt_name
## 1 count real 24 2 float4
DBI::dbDisconnect(db)
The work-around
The work-around is: add the argument bigint = "numeric"
to your dbConnect()
call. This is mentioned in the manual, but not the default and not called out in the package description or README
. Or, of course, you could use RPostgreSQL
.
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Researching and diagnosing above was part of my Mad Max
R
day.And there are some natural questions about RPostgres.
Could it have been a pull-request to RPostgreSQL? What is its relation (if any) to the RStudio Professional Drivers?
RPostgres is a complete rewrite, I have never got a reply from its maintainer. It aims to be consistent with the new DBI specification, whereas RPostgreSQL is best for backward compatibility.
RPostgres uses libpq to connect directly to a database, as far as I know the RStudio Professional drivers connect via ODBC.
The integer64 class has its drawbacks, but at some point it felt like the best of all bad alternatives. Writing 64-bit integers should work, if it doesn’t it’s a bug.
That definitely answers my concerns. Thanks! BTW your work on DBI has been very good for the R community.
I’ve now filed an issue on the write back point.