Dirk Eddelbuettel just shared an important point on software and analyses: dependencies are hard to manage risks.
If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish the correctness of all of the dependencies. This is worse than having non-reproducible research, as your work may have in fact been wrong even the first time.
Low dependencies and low complexity dependencies can also be wrong, but in this case there at least exists the possibility of checking things or running down and fixing issues.
This one reason we at Win-Vector LLC have been working on low-dependency R packages for data analysis. We don’t intend on controlling the whole analysis stack (that would be unethical), but we do intend to be in good position to fix things for our partners and clients. The bulk of our system’s utility comes from external systems such as
R itself, the
data.table package, and
Rcpp. So we must (and hopefully do) give credit and thanks.
Also, not all dependencies are equal. So we have had to avoid some popular packages with unstable APIs (a history of breaking changes) and high historic error rates (a history of complexity and adding features over fixing things).
Again, dependencies are but one measure of quality and at best an approximation. But let’s take a look at some of our packages through this lens.
And almost to make the point our package where we relaxed the above discipline right now has CRAN-flagged issue (“significant warnings”), that we can not fix as the issue is in fact from one of the dependencies.
The issue is likely from
ggplot2, which itself is likely picking up issues and errors from
rlang (a few of
ggplot2‘s dependencies that currently have detected, yet unfixed issues on CRAN). And these packages are likely picking up issues from their direct and indirect dependencies.
Now these issues are probably not serious, as if they were there would be a great panic motivating teams to fix them (this is a neat example of survivorship bias, visible acute problems attract enough attention to be fixed quickly- but often subtle chronic issues can live a long time). But the point is: we have no lever to fix them on our end.
Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.