I’ve thought of Pandas as in-memory column oriented data structure with reasonable performance. If I need high performance or scale, I can move to a database. I like Pandas, and thank the authors and maintainers for their efforts. Now I kind of wonder what Pandas is, or what it wants […]
Continuing (and hopefully ending) our quick series on software pathologies I would like to follow-up The Hyper Dance with “Rule 42 Software.”
A lot of machine learning, statistical, plotting, and analytics algorithms over-sell a small evil trick I call “the hyper dance.”
Our group has done a lot of work with non-standard calling conventions in R. Our tools work hard to eliminate non-standard calling (as is the purpose of wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still […]
Some days I see R as an eclectic programming language preferred by scientists. “Programming languages as people.” From Leftover Salad (David Marino). Other days I see it more like the following.
Kudos to Professor Andrew Gelman for telling a great joke at his own expense: Stupid-ass statisticians don’t know what a goddam confidence interval is. He brilliantly burlesqued a frustrating common occurrence many people say they “have never seen happen.” One of the pains of writing about data science is there […]
dplyr is one of the most popular R packages. It is powerful and important. But is it in fact easily comprehensible?
Here is an absolutely horrible way to confuse yourself and get an inflated reported R-squared on a simple linear regression model in R. We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out […]
There are a number of statistical principles that are perhaps more honored in the breach than in the observance. For fun I am going to name a few, and show why they are not always the “precision surgical knives of thought” one would hope for (working more like large hammers).