We are excited to announce the
rquery is Win-Vector LLC‘s currently in development big data query tool for
rquery supplies set of operators inspired by Edgar F. Codd‘s relational algebra (updated to reflect lessons learned from working with
dplyr at big data scale in production).
As an example:
rquery operators allow us to write our earlier “treatment and control” example as follows.
dQ <- d %.>% extend_se(., if_else_block( testexpr = "rand()>=0.5", thenexprs = qae( a_1 := 'treatment', a_2 := 'control'), elseexprs = qae( a_1 := 'control', a_2 := 'treatment'))) %.>% select_columns(., c("rowNum", "a_1", "a_2"))
rquery pipelines are first-class objects; so we can extend them, save them, and even print them.
cat(format(dQ)) table('d') %.>% extend(., ifebtest_1 := rand() >= 0.5) %.>% extend(., a_1 := ifelse(ifebtest_1, "treatment", a_1), a_2 := ifelse(ifebtest_1, "control", a_2)) %.>% extend(., a_1 := ifelse(!( ifebtest_1 ), "control", a_1), a_2 := ifelse(!( ifebtest_1 ), "treatment", a_2)) %.>% select_columns(., rowNum, a_1, a_2)
rquery targets only databases, and right now primarilly
rquery is primarily a
SQL generator, allowing it to avoid some of the trade-offs required to directly support in-memory
data.frames. We demonstrate converting the above
rquery pipeline into
SQL and executing it here.
rquery itself is still in early development (and not yet ready for extensive use in production), but it is maturing fast, and we expect more
rquery announcements going forward. Our current intent is to bring in sponsors, partners, and
R community voices to help develop and steer
Categories: data science Pragmatic Data Science Pragmatic Machine Learning Statistics
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
New rquery capablity: ad hoc queries https://winvector.github.io/rquery/articles/AdHocQueries.html .