In R
it has always been incorrect to call order()
on a data.frame
. Such a call doesn’t return a sort-order of the rows, and previously did not return an error. For example.
x | y |
---|---|
2 | 6 |
2 | 5 |
3 | 4 |
3 | 3 |
1 | 2 |
1 | 1 |
## [1] 5 6 12 1 2 11 3 4 10 9 8 7
Notice the above result has more than 6 items, so it is not a row order. It appears there is a desire to make this sort of mal-use signalling, and it is now available as an optional error-check. In fact we are starting to see packages kicked-off CRAN
for not fixing this issue.
Recent CRAN package removals (from CRANberries, triggered by failing to respond when contacted to fix the order()
error, (error resolves as “cannot xtfrm data frames”) include:
- ACCLMA
- ahp
- aMNLFA
- astrochron
- EasyMARK
- forestSAS
- gee4
- goeveg
- jmdl
- LncMod
- LN0SCIs
- marindicators
- McSpatial
- mcglm
- mpr
- pompom
- promotionImpact
- rodham
- rysgran
- scan
- sentometrics
- subtee
- unga
The wrapr
package has supplied, for some time, the function orderv()
, which is suitable for ordering the rows of data.frame
s.
For example, we can calculate a row order as follows.
## [1] 6 5 2 1 4 3
And use such an order to sort data rows.
x | y | |
---|---|---|
6 | 1 | 1 |
5 | 1 | 2 |
2 | 2 | 5 |
1 | 2 | 6 |
4 | 3 | 3 |
3 | 3 | 4 |
Essentially orderv(d)
is shorthand for do.call(base::order, as.list(d))
, which places the columns of the data.frame
as the ...
-arguments of the order()
call.
Edit: an earlier great fix can be found here.
Categories: Tutorials
jmount
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.
Can you explain what is “wrong” here? ?order makes it pretty clear that arrays and dataframes are coerced to a vector. So, I find:
> dd dd
[1] 2 2 3 3 1 1 6 5 4 3 2 1
> order(dd)
[1] 5 6 12 1 2 11 3 4 10 9 8 7
> dd[order(dd)]
[1] 1 1 1 2 2 2 3 3 3 4 5 6
As one would expect. Also, the returned value you posted does have 12 values (not “more than” as you wrote), doing what you asked.
If you want row orders, then you need to do order(d$x,d$y) .
sorry first line was supposed to be “dd = c( d_dollarsign_x , d_dollarsign_y)”
The “12” was a fumble- I meant to say “6” the number of rows of the original data frame, not the number of cells. Thank you for the correction.
As to what is wrong.
R has a history of commands that can “technically correct”, as they match their documentation. However, there is a natural expectation that order should calculate the row-order of what it is given. This is not the case for data frames, and R is changing to throw an error if so called in the future.
That is the sense I mean it is not right.