Menu Home

It Has Always Been Wrong to Call order on a data.frame

In R it has always been incorrect to call order() on a data.frame. Such a call doesn’t return a sort-order of the rows, and previously did not return an error. For example.

x y
2 6
2 5
3 4
3 3
1 2
1 1
##  [1]  5  6 12  1  2 11  3  4 10  9  8  7

Notice the above result has more than 6 items, so it is not a row order. It appears there is a desire to make this sort of mal-use signalling, and it is now available as an optional error-check. In fact we are starting to see packages kicked-off CRAN for not fixing this issue.

Recent CRAN package removals (from CRANberries, triggered by failing to respond when contacted to fix the order() error, (error resolves as “cannot xtfrm data frames”) include:

The wrapr package has supplied, for some time, the function orderv(), which is suitable for ordering the rows of data.frames.

For example, we can calculate a row order as follows.

## [1] 6 5 2 1 4 3

And use such an order to sort data rows.

x y
6 1 1
5 1 2
2 2 5
1 2 6
4 3 3
3 3 4

Essentially orderv(d) is shorthand for do.call(base::order, as.list(d)), which places the columns of the data.frame as the ...-arguments of the order() call.

Edit: an earlier great fix can be found here.

Categories: Programming Tutorials

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

3 replies

  1. Can you explain what is “wrong” here? ?order makes it pretty clear that arrays and dataframes are coerced to a vector. So, I find:
    > dd dd
    [1] 2 2 3 3 1 1 6 5 4 3 2 1
    > order(dd)
    [1] 5 6 12 1 2 11 3 4 10 9 8 7
    > dd[order(dd)]
    [1] 1 1 1 2 2 2 3 3 3 4 5 6
    As one would expect. Also, the returned value you posted does have 12 values (not “more than” as you wrote), doing what you asked.
    If you want row orders, then you need to do order(d$x,d$y) .

    Like

      1. The “12” was a fumble- I meant to say “6” the number of rows of the original data frame, not the number of cells. Thank you for the correction.

        As to what is wrong.

        R has a history of commands that can “technically correct”, as they match their documentation. However, there is a natural expectation that order should calculate the row-order of what it is given. This is not the case for data frames, and R is changing to throw an error if so called in the future.

        That is the sense I mean it is not right.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: