# R Tip: Consider radix Sort

`R` tip: consider using `radix` sort.

The “`method = "radix"`” option can greatly speed up sorting and ordering tables in `R`.

For a 1 million row table the speedup is already as much as 35 times (around 9.6 seconds versus 3 tenths of a second). Below is an excerpt from an experiment sorting showing default settings and showing radix sort (full code here).

``````
timings <- microbenchmark(
order_default = d[order(d\$col_a, d\$col_b, d\$col_c, d\$col_x), ,
drop = FALSE],
order_radix = d[order(d\$col_a, d\$col_b, d\$col_c, d\$col_x,
drop = FALSE],
check = my_check,
times = 10L)

print(timings)``````
``````## Unit: milliseconds
##           expr       min        lq      mean    median        uq
##  order_default 9531.2865 9653.6827 9759.8929 9690.6702 9833.2170
##    order_radix  262.1377  263.3226  278.2547  265.1452  274.2476
##         max neval
##  10329.3520    10
##    382.2544    10``````

This speedup is possible because Matt Dowle and Arun Srinivasan of the `data.table` team generously ported their radix sorting code into base-`R`! Please see `help(sort)` for details. So `data.table` is not only the best data manipulation package in `R`, the team actually works to improve `R` itself. This is what is meant by "`R` community" and what is needed to keep `R` vibrant and alive.

Edit/Note: Iñaki Úcar shared at least 2 good points in a follow-up article: if you are using factors you get `radix` sort for free (for technical reasons I tend to delay/disable conversion to factors), and I didn’t mention the loss of control of collation order. Because of that I am changing the article title from “R tip: Use Radix Sort” to “R Tip: Consider radix Sort”.

### Like this:

Categories: Coding Tutorials

Tagged as:

### jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

### 5 replies ›

1. jdlong says:

I think the expression you’re looking for is, “Great minds think alike” …

1. Thanks for that. I assume ZJ has good intent, but after I while I too started to feel uncomfortable with “men” (Win-Vector LLC itself is a 50% women owned company).

1. Ian Dworkin says:

Thanks for this. I had a question, I thought that the default was “radix” at least for short vectors (less than 2^31 elements as “short”)

It turns out: the default method selection (via `match.arg()`) is `"auto"` and `"auto"` will switch to `"radix"` for `numerics`, `factors`, and `logicals` ; but not for `character` types. So an explicit setting of “method = "radix"” is not a no-op as that avoids the following bit of code (from `print(order)`):

``` if (method == "auto") {
I am guessing the reason is `radix` sort doesn’t obey the collating sequence of the locale, so using it on `character` types without asking the user could be an issue. For an application where you are using sorting to move identically keyed rows together this isn’t going to be a problem. For our example the two orders were the same for the keys we had as we did confirm identical results.