Menu Home

Neglected R Super Functions

R has a lot of under-appreciated super powerful functions. I list a few of our favorites below.

6095431665 88664494f0 b

Atlas, carrying the sky. Royal Palace (Paleis op de Dam), Amsterdam.

Photo: Dominik Bartsch, CC some rights reserved.

  • stats::approx(): approximate a curve/function.
  • base::cumsum(): cumulative ordered sum.
  • stats::ecdf(): estimate the cumulative distribution function.
  • base::findInterval(): assign values to bins.
  • base::match(): bulk computation of first match. Can lookup and sort data and even find non-duplicate data.
  • base::Reduce(): nifty functional method to combine multiple function evaluations.
  • base::tapply(): grouped summary function.
  • base::unlist(): build arrays of atomic values from more complicated nested structures.
  • base::Vectorize(): Convert scalar functions into functions ready to operate on arrays.

We would love to hear about some of your favorites.

Categories: Opinion

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

14 replies

  1. Not sure if they’re “neglected,” but I use setdiff and intersect a lot. They’re both just calls to match with a little extra logic, but they definitely improve readability.

  2. base::aggregate() works like similar to tapply gut returns a data frame
    base::mapply() can substitute some nested for loops by taking multiple arguments
    base::assign() is very useful when creating several variables from a for loop

  3. Why “neglected”? I use all of these on a regular basis. The ones you may have missed are stats::approxfun() and stats::splinefun(). I find the functional versions of approx() and spline() much easier to get my head around.

  4. My favorites, not in your list are the hdquantile function of the Hmisc package, sapply from base, and probably Matrix from the Matrix package, with its compressed matrix formats.

    As general techniques, splines seemed undermentioned, whether Akima interpolating via akima package and aspline, or pspline package and its smooth.Pspline function.

    1. You mean I don’t need to use match(x, sort(x)) any more?
      Next you’ll be telling me there’s a function for match(sort(x), x) called ‘order(x)’ or something…
      Sheesh!

      1. Again, more in the spirit of: I remembered an odd sorting application that rank() is convenient for (in contrast to rank()‘s obvious utility in ranking things).

  5. For plotting, I find graphics::grconvertX() and graphics::grconvertY() very useful, particularly with boxplots. graphics::layout() is also very handy.

  6. I like pretty especially in base R plots. Setting ylim=range(pretty(X)) makes plots (boxplots, barplots, scatterplots)… prettier :)

  7. It’s funny, all of these are essentially base primitives in APL languages. It’s kind of amazing Iverson thought of everything before he even had a repl to work with.

%d bloggers like this: