A kind reader recently shared the following comment on the Practical Data Science with R 2nd Edition live-site. Thanks for the chapter on data frames and data.tables. It has helped me overcome an obstacle freeing me from a lot of warnings telling me my data table was not a real […]
I’d like to share some new timings on a grouped in-place aggregation task. A client of mine was seeing some slow performance, so I decided to time a very simple abstraction of one of the steps of their workflow.
Our goal has been to make rquery the best query generation system for R (and to make data_algebra the best query generator for Python). Lets see what rquery is good at, and what new features are making rquery better.
My favorite R data.table feature is the “by” grouping notation when combined with the := notation. Let’s take a look at this powerful notation.
This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement. The original published timings were as follows: With performance metrics: measurements are marketing. So let’s dig in the above a bit.
Saghir Bashir of ilustat recently shared a nice getting started with R and tidyverse guide. In addition they were generous enough to link to Dirk Eddelbuette’s later adaption of the guide to use data.table. This type of cooperation and user choice is what keeps the R community vital. Please encourage […]
According to a KDD poll fewer respondents (by rate) used only R in 2017 than in 2016. At the same time more respondents (by rate) used only Python in 2017 than in 2016. Let’s take this as an excuse to take a quick look at what happens when we try […]
I’ve ended up (almost accidentally) collecting a number of different solutions to the “use a column to choose values from other columns in R” problem. Please read on for a brief benchmark comparing these methods/solutions.
We recently saw a great recurring R question: “how do you use one column to choose a different value for each row?” That is: how do you use a column as an index? Please read on for some idiomatic base R, data.table, and dplyr solutions.
If your R or dplyr work is taking what you consider to be a too long (seconds instead of instant, or minutes instead of seconds, or hours instead of minutes, or a day instead of an hour) then try data.table. For some tasks data.table is routinely faster than alternatives at […]