We have a new rquery vignette here: Working with Many Columns. This is an attempt to get back to writing about how to use the package to work with data (versus the other-day’s discussion of package design/implementation). Please check it out.
Estimated reading time: 21 seconds
Introduction I would like to talk about some of the design principles underlying the data_algebra package (and also in its sibling rquery package). The data_algebra package is a query generator that can act on either Pandas data frames or on SQL tables. This is discussed on the project site and […]
Estimated reading time: 31 minutes
Our goal has been to make rquery the best query generation system for R (and to make data_algebra the best query generator for Python). Lets see what rquery is good at, and what new features are making rquery better.
Estimated reading time: 10 minutes
Introduction rquery is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of R’s base::transform(), or dplyr’s dplyr::mutate() and uses a pipe in the style popularized in R with magrittr. The operators themselves follow the selections in […]
Estimated reading time: 14 minutes
This article introduces the data_algebra project: a data processing tool family available in R and Python. These tools are designed to transform data either in-memory or on remote databases. In particular we will discuss the Python implementation (also called data_algebra) and its relation to the mature R implementations (rquery and […]
Estimated reading time: 25 minutes
Let’s try some "ugly corner cases" for data manipulation in R. Corner cases are examples where the user might be running to the edge of where the package developer intended their package to work, and thus often where things can go wrong. Let’s see what happens when we try to […]
Estimated reading time: 8 minutes
The rquery R package has several places where the user can ask for what they have typed in to be substituted for a name or value stored in a variable. This becomes important as many of the rquery commands capture column names from un-executed code. So knowing if something is […]
Estimated reading time: 13 minutes
Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. They compare a case-based approach (where the bin divisions are stuffed into code) with a join based approach. They share code and timings. Best of all: rquery gets some attention and turns […]
Estimated reading time: 37 seconds
To make getting started with rquery (an advanced query generator for R) easier we have re-worked the package README for various data-sources (including SparkR!).
Estimated reading time: 39 seconds
R users have been enjoying the benefits of SQL query generators for quite some time, most notably using the dbplyr package. I would like to talk about some features of our own rquery query generator, concentrating on derived result re-use.
Estimated reading time: 12 minutes