Menu Home

Data Algebra 0.9.0 Release

I am pleased to announce the 0.9.0 release of the data algebra. The data algebra is realization of the Codd relational algebra for data in written in terms of Python method chaining. It allows the concise clear specification of useful data transforms. Some examples can be found here. Benefits include […]

Back to Teaching

Back to teaching. For a few years we’ve been running a data science intensive at for a really neat FAAMG company. The idea is to give engineers some hands on live workbook time using methods varying from linear regression, xgboost, to deep neural networks. Learning how participants progress and internalize […]

“Statistics to English Translation”

The core of our “statistics to English translation” series is Nina Zumel’s sequence of articles: “I don’t think that means what you think it means;” Statistics to English Translation, Part 1: Accuracy Measures Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’ Statistics to English Translation, Part 2b: […]

New WVPlot: ROCPlotPairList

We have a new R WVPlots plot: ROCPlotPairList. It is useful for comparing the ROC/AUC of multiple models on the same data set. library(WVPlots) set.seed(34903490) x1 <- rnorm(50) x2 <- rnorm(length(x1)) x3 <- rnorm(length(x1)) y <- 0.2*x2^2 + 0.5*x2 + x1 + rnorm(length(x1)) frm <- data.frame( x1 = x1, x2 […]