Menu Home

How robust is logistic regression?

Logistic Regression is a popular and effective technique for modeling categorical outcomes as a function of both continuous and categorical variables. The question is: how robust is it? Or: how robust are the common implementations? (note: we are using robust in a more standard English sense of performs well for […]

Correlation and R-Squared

What is R2? In the context of predictive models (usually linear regression), where y is the true outcome, and f is the model’s prediction, the definition that I see most often is: In words, R2 is a measure of how much of the variance in y is explained by the […]

The Simpler Derivation of Logistic Regression

Logistic regression is one of the most popular ways to fit models for categorical data, especially for binary response data. It is the most important (and probably most used) member of a class of models called generalized linear models. Unlike linear regression, logistic regression can directly predict probabilities (values that […]

Your Data is Never the Right Shape

One of the recurring frustrations in data analytics is that your data is never in the right shape. Worst case: you are not aware of this and every step you attempt is more expensive, less reliable and less informative than you would want. Best case: you notice this and have […]

Living in A Lognormal World

Recently, we had a client come to us with (among other things) the following question: Who is more valuable, Customer Type A, or Customer Type B? This client already tracked the net profit and loss generated by every customer who used his services, and had begun to analyze his customers […]