If your R or dplyr work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try data.table. For some tasks data.table is routinely faster than alternatives at […]
Not a full R article, but a quick note demonstrating by example the advantage of being able to collect many expressions and pack them into a single extend_se() node.
A big thank you to Databricks for working with us and sharing: rquery: Practical Big Data Transforms for R-Spark Users How to use rquery with Apache Spark on Databricks rquery on Databricks is a great data science tool.
rquery and rqdatatable are new R packages for data wrangling; either at scale (in databases, or big data systems such as Apache Spark), or in-memory. The packages speed up both execution (through optimizations) and development (though a good mental model and up-front error checking) for data wrangling tasks. Win-Vector LLC‘s […]
rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package. rquery is already one of the […]
My BARUG rquery talk went very well, thank you very much to the attendees for being an attentive and generous audience. (John teaching rquery at BARUG, photo credit: Timothy Liu) I am now looking for invitations to give a streamlined version of this talk privately to groups using R who […]
Take care if trying the new RPostgres database connection package. By default it returns some non-standard types that code developed against other database drivers may not expect, and may not be ready to defend against. Danger, Will Robinson!
I would like to thank LinkedIn for letting me speak with some of their data scientists and analysts. John Mount discussing rquery SQL generation at LinkedIn. If you have a group using R at database or Spark scale, please reach out ( jmount at win-vector.com ). We (Win-Vector LLC) have […]
Win-Vector LLC recently announced the rquery R package, an operator based query generator. In this note I want to share some exciting and favorable initial rquery benchmark timings.
We are excited to announce the rquery R package. rquery is Win-Vector LLC‘s currently in development big data query tool for R. rquery supplies set of operators inspired by Edgar F. Codd‘s relational algebra (updated to reflect lessons learned from working with R, SQL, and dplyr at big data scale […]