Introduction This note shares an experiment comparing the performance of a number of data processing systems available in R. Our notional or example problem is finding the top ranking item per group (group defined by three string columns, and order defined by a single numeric column). This is a common […]
R tip: consider using radix sort.
The data.table R package is really good at sorting. Below is a comparison of it versus dplyr for a range of problem sizes.
Introduction In this note we will show how to speed up work in R by partitioning data and process-level parallelization. We will show the technique with three different R packages: rqdatatable, data.table, and dplyr. The methods shown will also work with base-R and other packages. For each of the above […]
rquery is an R package for specifying data transforms using piped Codd-style operators. It has already shown great performance on PostgreSQL and Apache Spark. rqdatatable is a new package that supplies a screaming fast implementation of the rquery system in-memory using the data.table package. rquery is already one of the […]
I would like to demonstrate some helpful wrapr R notation tools that really neaten up your R code. Img: Christopher Ziemnowicz.
“Base R” (call it “Pure R”, “Good Old R”, just don’t call it “Old R” or late for dinner) can be fast for in-memory tasks. This is despite the commonly repeated claim that: “packages written in C/C++ are (edit: “always”) faster than R code.” The benchmark results of “rquery: Fast […]
I’ve been asked if the adapter “let” from our R package replyr works with data.table. My answer is: it does work. I am not a data.table user so I am not the one to ask if data.table benefits a from a non-standard evaluation to standard evaluation adapter such as replyr::let.