One can summarize how a 0/1 variable x relates to a 0/1 variable y as by writing down:
- The true positives (
tp
), the number of timesx = 1
andy = 1
. - The false positives (
fp
), the number of timesx = 1
andy = 0
. - The true negatives (
tn
), the number of timesx = 0
andy = 0
. - The false negatives (
fn
), the number of timesx = 0
andy = 1
.
These four numbers can be organized in convenient table called “the confusion matrix” as follows.
x = 0 | x = 1 | |
y = 1 | fn | tp |
y = 0 | tn | fp |
The xicor coefficient is itself a random variable over permutations of the items that are “x ties” (have the same x value). The individual draws of the xicor estimate can be wild (and even include negative values). However, the expected value can be estimated with the following determinant formula (our, presumably new, result):
The above is, in my opinion, quite beautiful. It allows confirmation of a number of known properties of the expected value of xicor (in this case) to be read of quickly (such as symmetries and the non-negativity of the expected value).
The derivation of this, and some consequences, can be found here.
Categories: data science Mathematics Statistics
Is it possible that you will develop this for R as well?
There is already a great xicor package for R: https://CRAN.R-project.org/package=XICOR . The specialization to confusion matrices is simply a matter of performing arithmetic over the confusion matrix entries. Nina has some work in R on using xicor here: https://win-vector.com/2021/12/29/exploring-the-xi-correlation-coefficient/