R
tip: consider using radix
sort.
The “method = "radix"
” option can greatly speed up sorting and ordering tables in R
.
For a 1 million row table the speedup is already as much as 35 times (around 9.6 seconds versus 3 tenths of a second). Below is an excerpt from an experiment sorting showing default settings and showing radix sort (full code here).
timings <- microbenchmark(
order_default = d[order(d$col_a, d$col_b, d$col_c, d$col_x), ,
drop = FALSE],
order_radix = d[order(d$col_a, d$col_b, d$col_c, d$col_x,
method = "radix"), ,
drop = FALSE],
check = my_check,
times = 10L)
print(timings)
## Unit: milliseconds
## expr min lq mean median uq
## order_default 9531.2865 9653.6827 9759.8929 9690.6702 9833.2170
## order_radix 262.1377 263.3226 278.2547 265.1452 274.2476
## max neval
## 10329.3520 10
## 382.2544 10
Great men think alike! I gave the same tip not long ago, see https://www.linkedin.com/feed/update/urn:li:activity:6347046872770760706
Also check out my article comparing sorting performance in R vs Julia and Python here: https://www.codementor.io/zhuojiadai/julia-vs-r-vs-python-string-sort-performance-an-unfinished-journey-to-optimizing-julia-s-performance-f57tf9f8s
I think the expression you’re looking for is, “Great minds think alike” …
Thanks for that. I assume ZJ has good intent, but after I while I too started to feel uncomfortable with “men” (Win-Vector LLC itself is a 50% women owned company).
Thanks for this. I had a question, I thought that the default was “radix” at least for short vectors (less than 2^31 elements as “short”)
I myself had not read the help carefully enough at first (and did not think about this point until you raised it, thanks!).
It turns out: the default method selection (via
match.arg()
) is"auto"
and"auto"
will switch to"radix"
fornumerics
,factors
, andlogicals
; but not forcharacter
types. So an explicit setting of “method = "radix"” is not a no-op as that avoids the following bit of code (fromprint(order)
):I am guessing the reason is
radix
sort doesn’t obey the collating sequence of the locale, so using it oncharacter
types without asking the user could be an issue. For an application where you are using sorting to move identically keyed rows together this isn’t going to be a problem. For our example the two orders were the same for the keys we had as we did confirm identical results.