R has a lot of under-appreciated super powerful functions. I list a few of our favorites below.
Atlas, carrying the sky. Royal Palace (Paleis op de Dam), Amsterdam.
Photo: Dominik Bartsch, CC some rights reserved.
stats::approx(): approximate a curve/function.
base::cumsum(): cumulative ordered sum.
stats::ecdf(): estimate the cumulative distribution function.
base::findInterval(): assign values to bins.
base::match(): bulk computation of first match. Can lookup and sort data and even find non-duplicate data.
base::Reduce(): nifty functional method to combine multiple function evaluations.
base::tapply(): grouped summary function.
base::unlist(): build arrays of atomic values from more complicated nested structures.
base::Vectorize(): Convert scalar functions into functions ready to operate on arrays.
We would love to hear about some of your favorites.
Categories: Opinion Programming Statistics
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Not sure if they’re “neglected,” but I use
intersecta lot. They’re both just calls to
matchwith a little extra logic, but they definitely improve readability.
unique()are good to keep in mind.
table() is a function I use a great deal
base::aggregate() works like similar to tapply gut returns a data frame
base::mapply() can substitute some nested for loops by taking multiple arguments
base::assign() is very useful when creating several variables from a for loop
Why “neglected”? I use all of these on a regular basis. The ones you may have missed are stats::approxfun() and stats::splinefun(). I find the functional versions of approx() and spline() much easier to get my head around.
“neglected” may be a stretch- but these functions are so great they definitely deserve an extra call-out.
My favorites, not in your list are the hdquantile function of the Hmisc package, sapply from base, and probably Matrix from the Matrix package, with its compressed matrix formats.
As general techniques, splines seemed undermentioned, whether Akima interpolating via akima package and aspline, or pspline package and its smooth.Pspline function.
Forgot good old
You mean I don’t need to use match(x, sort(x)) any more?
Next you’ll be telling me there’s a function for match(sort(x), x) called ‘order(x)’ or something…
Again, more in the spirit of: I remembered an odd sorting application that
rank()is convenient for (in contrast to
rank()‘s obvious utility in ranking things).
For plotting, I find graphics::grconvertX() and graphics::grconvertY() very useful, particularly with boxplots. graphics::layout() is also very handy.
prettyespecially in base R plots. Setting
ylim=range(pretty(X))makes plots (boxplots, barplots, scatterplots)… prettier :)
It’s funny, all of these are essentially base primitives in APL languages. It’s kind of amazing Iverson thought of everything before he even had a repl to work with.
Iverson punched way above his weight.