Question: how hard is it to count rows using the R package dplyr? Answer: surprisingly difficult. When trying to count rows using dplyr or dplyr controlled data-structures (remote tbls such as Sparklyr or dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid dplyr corner-cases and […]
Recently I noticed that the R package sparklyr had the following odd behavior: suppressPackageStartupMessages(library("dplyr")) library("sparklyr") packageVersion("dplyr") #>  ‘0.7.2.9000’ packageVersion("sparklyr") #>  ‘0.6.2’ packageVersion("dbplyr") #>  ‘18.104.22.16800’ sc <- spark_connect(master = ‘local’) #> * Using Spark: 2.1.0 d <- dplyr::copy_to(sc, data.frame(x = 1:2)) dim(d) #>  NA ncol(d) #>  […]
The Win-Vector public R packages now all have new pkgdown documentation sites! (And, a thank-you to Hadley Wickham for developing the pkgdown tool.) Please check them out (hint: vtreat is our favorite).
In our latest R and Big Data article we discuss replyr. Why replyr replyr stands for REmote PLYing of big data for R. Why should R users try replyr? Because it lets you take a number of common working patterns and apply them to remote data (such as databases or […]
In our latest installment of “R and big data” let’s again discuss the task of left joining many tables from a data warehouse using R and a system called "a join controller" (last discussed here). One of the great advantages to specifying complicated sequences of operations in data (rather than […]
In this article we will discuss composing standard-evaluation interfaces (SE: parametric, referentially transparent, or “looks only at values”) and composing non-standard-evaluation interfaces (NSE) in R. In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces. To use it you […]
This note describes a useful replyr tool we call a "join controller" (and is part of our "R and Big Data" series, please see here for the introduction, and here for one our big data courses).
Saw this the other day: In defense of wrapr::let() (originally part of replyr, and still re-exported by that package) I would say: let() was deliberately designed for a single real-world use case: working with data when you don’t know the column names when you are writing the code (i.e., the […]
Our next "R and big data tip" is: summarizing big data. We always say "if you are not looking at the data, you are not doing science"- and for big data you are very dependent on summaries (as you can’t actually look at everything). Simple question: is there an easy […]
R is a very fluid language amenable to meta-programming, or alterations of the language itself. This has allowed the late user-driven introduction of a number of powerful features such as magrittr pipes, the foreach system, futures, data.table, and dplyr. Please read on for some small meta-programming effects we have been […]