Another R tip. Get in the habit of using drop = FALSE when indexing (using [ , ] on) data.frames. Prince Rupert’s drops (img: Wikimedia Commons)
Is R base::subset() really that bad?
R tip: use [[ ]] wherever you can. In R the [[ ]] is the operator that (when supplied a simple scalar argument) pulls a single element out of lists (and the [ ] operator pulls out sub-lists). For vectors [[ ]] and [ ] appear to be synonyms (modulo […]
There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; […]
Question: how hard is it to count rows using the R package dplyr? Answer: surprisingly difficult. When trying to count rows using dplyr or dplyr controlled data-structures (remote tbls such as Sparklyr or dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid dplyr corner-cases and […]
dplyr is one of the most popular R packages. It is powerful and important. But is it in fact easily comprehensible?
For many R users the magrittr pipe is a popular way to arrange computation and famously part of the tidyverse. The tidyverse itself is a rapidly evolving centrally controlled package collection. The tidyverse authors publicly appear to be interested in re-basing the tidyverse in terms of their new rlang/tidyeval package. […]
From dplyr issue 2916. The following appears to work. suppressPackageStartupMessages(library("dplyr")) COL <- "homeworld" starwars %>% group_by(.data[[COL]]) %>% head(n=1) ## # A tibble: 1 x 14 ## # Groups: COL  ## name height mass hair_color skin_color eye_color birth_year ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> ## 1 Luke Skywalker […]
In this article we will discuss composing standard-evaluation interfaces (SE: parametric, referentially transparent, or “looks only at values”) and composing non-standard-evaluation interfaces (NSE) in R. In R the package tidyeval/rlang is a tool for building domain specific languages intended to allow easier composition of NSE interfaces. To use it you […]
Here is an absolutely horrible way to confuse yourself and get an inflated reported R-squared on a simple linear regression model in R. We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out […]