R tip. Use `wrapr::match_order()`

to align data.

Suppose we have data in two data frames, and both of these data frames have common row-identifying columns called “`idx`

“.

library("wrapr") d1 <- build_frame( "idx", "x" | 3 , "a" | 1 , "b" | 2 , "c" ) d2 <- build_frame( "idx", "y" | 2 , "D" | 1 , "E" | 3 , "F" ) print(d1) #> idx x #> 1 3 a #> 2 1 b #> 3 2 c print(d2) #> idx y #> 1 2 D #> 2 1 E #> 3 3 F

(Please see R Tip: Think in Terms of Values for `build_frame()`

and other value capturing tools.)

Often we wish to work with such data aligned so each row in `d2`

has the same `idx`

value as the same row (by row order) as `d1`

. This is an important data wrangling task, so there are many ways to achieve it in R, such as `base::merge()`

, `dplyr::left_join()`

, or by sorting both tables into the same order and then using `base::cbind()`

.

However if you wish to preserve the order of the first table (which may not be sorted), you need one more trick.

You can add a row-id column, sort by the joining id, combine and then re-sort by the row-id column.

Or you can match the orders in one step using `wrapr::match_order()`

.

p <- match_order(d2$idx, d1$idx) print(d2[p, , drop=FALSE]) #> idx y #> 3 3 F #> 2 1 E #> 1 2 D

`match_order`

is merely wrapping all of the sort and re-sort tricks we mentioned above, however the theory is based on the absolute magic of associative array indexing.

Please see R Tip: Use `drop = FALSE`

with `data.frame`

s, for why one should get in the habit of writing `drop = FALSE`

.

### jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

Also,

`base::match(d1$idx, d2$idx)`

should give the correct permutation.