Menu Home

On “Competition” in the R Ecosystem

I’ve been thinking a bit on “competition” in the R ecosystem.

I guess the closest I can come to a fair and coherent view on “competition” in the R ecosystem is some variation of the following.

  • I, of course, should not be treating things as a competition. We are all doing work and hoping for a bit of public mind share.
  • We all want our own work to do well. So we are a little sad if other work supplants our work, and a little happy if our work is adopted. However, we must respect if our work is adopted, we are supplanting other work- the very thing we do not enjoy when it happens to us.

So, I’d definitely like to apologize for times I have not thought clearly and treated some aspects of the ecosystem as competition. It is when we are thinking hardest about ourselves we are mostly likely to offend others.

That being said, there is some context that I feel matters.

  • Please understand any new technique is always going to be asked to compare itself to both base-R and the tidyverse. These are natural questions. One has to walk a fine line between not mentioning these (and perhaps unfairly slighting them), or adding the comparison (and seeming pushy).
  • Size, distribution, and transparency matter. A new package that is promoted by a large company and/or immediately included in popular packages or meta-packages controlled the same authors can eliminate even the possibility of fair comparison to other work. Frankly I think there is some responsibility to take additional care and concern in these cases. Winner take all popularity tracking systems have similar risks (encouraging new users come to conclusions prior to looking at any alternatives).
  • Precedence in no way entitles one to priority. Sometimes our work is a later alternative to earlier work by others (and we do try to give credit in these situations), and sometimes others’ work is a later alternative to ours. And frankly sometimes base-R already does a good job, and we just missed it (though we are not alone in that, also one must take care to respect that base-R itself is a collection of other people’s contributions).

For example: our own wrapr dot-arrow pipe comes long after the magritter pipe. We try to keep the history clear, but frankly it takes some effort for work related to such a popular notation to be heard. I understand some are offended by our promotional effort, but we feel we have some valuable improvements to share (which can only be shown by comparison), and writing notes is the only platform we have.

As a contrary example: our own let() method comes before the rlang package, but has been formally criticized as being too similar to rlang. We’ve tried to write down some of the context, but really that should not be our task alone.

Of course there is a risk of a (hopefully breakable) negative cycle: what we do when frustrated, in turn frustrates others.

Categories: Opinion

Tagged as:

jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

4 replies

  1. I admire the careful tone you have taken with this post. I believe that the proliferation of tidyverse, is generally a positive direction for the R community but it is problematic. Its ubiquity drowns any potential alternatives that might be superior (even if only under specific circumstances).

    I am appreciative of your work in developing and benchmarking novel computational strategies. Based on your posts, I decided to remove tidyverse from my default build, instead opting to incorporate only the specific packages required. Additionally, I am looking into replacing tibble with data.table as necessary.

    The main hindrance here is that I am much more proficient with tibble and dplyr, so I’m facing a tradeoff between the data.table learning curve and the dplyr running time. However, I feel that this is a common enough challenge in software engineering.

    So, this is not a competition, but you are an underdog in some regard. Please keep fighting the good fight :-)

    1. Thanks. I am not often smart enough to be careful (which is bad for me, often good for others).

      I was very late to data.table myself. What I have learned as strange as the learning curve is, the package is very consistent and stable. So you don’t have to re-learn or un-learn as you move from task to task. Really the world needs another good data.table book. I don’t have the time or experience to write it- but I think such a book would make a difference.

  2. Funny you mention data.table in the comments. WinVector packages are somewhat like data.table to dplyr – less obvious at first, takes time to learn but it pays off in the long-run.

    My only wish was your documentation was more ‘idiot-friendly’ that held my hand a little more! I would say the rstudio guys do a great job in the ‘hello world’ intro to their packages making them easier to pick up and play with. I’m still trying to get to grips with using vtreat in a prod environment but i’ll get there!

    1. Thank you for the nice complement. Most of our packages arise from needs we have seen in production in our consulting practice. So they are largely designed to have the right trade-offs one is going to need later, however that may be why some things about them are not immediately obvious. Of course we will be working to improve documentation (especially basic documentation), and vtreat will be used in Practical Data Science with R 2nd edition!

%d bloggers like this: