Back to teaching. For a few years we’ve been running a data science intensive at for a really neat FAAMG company. The idea is to give engineers some hands on live workbook time using methods varying from linear regression, xgboost, to deep neural networks. Learning how participants progress and internalize […]
Every programmer should have an opinion on what the outcomes of the expressions like “5” == 5 should be, and perhaps even a guess as to what the answer is in their most familiar programming language. In my opinion SQL gets it right. For example, we get the following in […]
I’d like to work an example of using SQL WITH Common Table Expressions to produce more legible SQL.
I have up what I think is a really neat tutorial on how to plot multiple curves on a graph in Python, using seaborn and data_algebra. It is great way to show some data shaping theory convenience functions we have developed. Please check it out.
In R it has always been incorrect to call order() on a data.frame. Such a call doesn’t return a sort-order of the rows, and previously did not return an error. For example. d <- data.frame( x = c(2, 2, 3, 3, 1, 1), y = 6:1) knitr::kable(d) x y 2 […]
The wrapr R package supplies a number of substantial programming tools, including the S3/S4 compatible dot-pipe, unpack/pack object tools, and many more. It also supplies a number of formatting and parsing convenience tools: qc() (“quoting concatenate”): quotes strings, giving value-oriented interfaces much of the incidental convenience of non-standard evaluation (NSE) […]
Continuing (and hopefully ending) our quick series on software pathologies I would like to follow-up The Hyper Dance with “Rule 42 Software.”
A lot of machine learning, statistical, plotting, and analytics algorithms over-sell a small evil trick I call “the hyper dance.”
It looks like R is getting an official pipe operator (ref). R doesn’t work under an RFC process, so we hear about these things and they are discussed on the R-devel mailing list. I’ve written on this topic before (ref), and I have taped some new comments. This sort of […]
R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values. Of course, sometimes it takes a while […]