R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values.
Of course, sometimes it takes a while to figure out how to do this. Please read for a great R matrix lookup problem and solution.
In R we can specify operations over vectors. For arithmetic this is easy, but some more complex operations you “need to know the trick.”
Patrick Freeman (@PTFreeman) recently asked: what is the idiomatic way to look up a bunch of values from a matrix by row and column keys? This is actually easy to do if we first expand the data matrix into RDF-triples. If our data were in this format we could merge/join it against our desired column indices.
Let’s start with an example data matrix.
# example matrix data m <- matrix(1:9, nrow = 3) row.names(m) <- c('R1' ,'R2', 'R3') colnames(m) <- c('C1', 'C2', 'C3') knitr::kable(m)
And our data-frame containing the indices we want to look-up.
# row/columns we want w <- data.frame( i = c('R1', 'R2', 'R2'), j = c('C2', 'C3', 'C2')) knitr::kable(w)
That is: we want to know the matrix values from [R1, C2], [R2, C3], and [R2, C2].
The trick is: how do we convert the matrix into triples? digEmAll, has a great solution to that here.
# unpack into 3-column format from: # https://stackoverflow.com/a/9913601 triples <- data.frame( i = rep(row.names(m), ncol(m)), j = rep(colnames(m), each = nrow(m)), v = as.vector(m)) knitr::kable(triples)
What the above code has done is: write each entry of the original matrix as a separate row with the original row and column ids landed as new columns. This data format is very useful.
The above code is worth saving as a re-usable snippet, as getting it right is a clever step.
Now we can vectorize our lookup using the merge command, which produces a new joined table where the desired values have been landed as a new column.
res <- merge(w, triples, by = c('i', 'j'), sort = FALSE) knitr::kable(res)
And that is it: we have used vectorized and relational concepts to look up many values from a matrix very quickly.
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.