Authors: John Mount and Nina Zumel

Nina and I were noodling with some variations of differentially private machine learning, and think we have found a variation of a standard practice that is actually fairly efficient in establishing ~~differential privacy~~ a privacy condition (but, as commenters pointed out- not differential privacy).

Read on for the idea and a rough analysis.

## The idea

A commonly discussed step in establishing differential privacy is to add some Laplace distributed noise to queries. It works (when used in conjunction with other steps), but it can seem mysterious. We think bootstrap resampling should be considered more seriously as a component in privacy preserving procedures (despite some references claiming it is not enough on its own).

We are going to simplify analysis and consider indistinguishability to be the good effect we are trying to establish, and variance to be the bad side-effect we are willing to put up with to achieve indistinguishability.

Throughout let `n`

be the number of examples in the set you are using “as a test.” The scheme we are analyzing is allowing count queries against this test set of the form “what fraction of rows match a given predicate.” Returning exact answers to such queries is not differentially private, so some form of noise addition is used (usually adding a Laplace random variable as noise).

## The method

Consider the following alternate technique to defend the test set in a privacy scheme.

Pick a positive real number `Z < n`

. Think of `Z`

as taking on a value like 10.

When it comes time to compute a noisy test-set score instead score a bootstrap re-sample of the test set of size `ceiling(n/Z)`

where `n`

is the size of your actual test set. Each time a score is asked for: re-do the Bootstrap. This re-sampling introduces empirical sampling noise (a standard trick) and by varying the size of `Z`

we can vary the amount of noise (similar to varying the Laplace distribution parameter in Laplace noise addition). The trick is we compute on these bootstrapped samples, but never share them (or the original set they are drawing from). We can use this for model scoring (as in stepwise regression) or for variable coding (as in effect/impact coding).

The question is: does this establish privacy, and does it do it while introducing only a reasonable amount of variance? It does not establish differential privacy, but it does establish a weak form of near indistinguishability (and one could also wonder about the delta-variant of differential privacy).

Recall for epsilon differential privacy for our measurement `A`

we must have for every pair of sets `S`

(chosen by an adversary) we must have for any data sets D1, D2 that differ by only one row:

```
``` log(P[A(D1) in S] / P[A(D2) in S]) ≤ epsilon

We don’t establish that, but wonder if establishing epsilon indistinguishability of the form:

```
``` P[ (A(D1) in S) ≠ (A(D2) in S) ] ≤ epsilon

might not be enough to drive data re-use proofs (but not to protect sensitive data).

We work only one example, but the calculation gives the idea.

### The argument

Suppose in a privacy proof the adversary submits two sets: one that is `n`

zeros, and one that is `n-1`

zeros and one one. The query is what fraction of the rows are non-zero. We claim a bootstrap re-sample of size `ceiling(n/Z)`

roughly establishes indistinguishability with `epsilon = 1/Z`

and a variance of `Z/(epsilon n^2)`

for `Z,epsilon,n`

in an appropriate range.

Consider what happens when sampling from the set with the 1-row. The number of times the unique 1-row is copied into the bootstrap sampling is Poisson distributed with expected value of `1/Z`

. This follows from the linearity of expectation: we have a sum of `n/Z`

rows in the bootstrap sample each of which has an expected value of `1/n`

.

#### Estimating single-query indistinguishability

By Markov’s inequality we know `P[count ≥ 1] ≤ E[count]`

. So it follows the 1-row shows up at all in the bootstrap set with probability no more than `1/Z`

. As the presence of this row is the only way to tell the sets apart we have `epsilon`

indistinguishability with `epsilon = 1/Z`

(pretty much by definition). Or: with `Z=1/epsilon`

we can hope for `epsilon`

indistinguishability.

#### Estimating the variance

Now consider the variance of the frequency estimate we are returning. Because the 1-row count is a Poisson process we know it has variance equal to its mean. So the 1-row count is a random variable with mean `1/Z`

and variance `1/Z`

. Frequency is `count/setSize`

which is `count/(n/Z)`

. So the frequency estimate is a random variable with mean `(1/Z)/(n/Z) = 1/n`

and variance `(1/Z)/(n/Z)^2 = Z/n^2`

. Substituting in `Z = 1/epsilon`

we get variance of the frequency estimate of `1/(epsilon n^2)`

.

### The result

So we should expect this re-sized bootstrap scheme with `Z=1/epsilon`

to achieve `epsilon`

indistinguishability at a cost of introducing `1/(epsilon n^2)`

units of variance. From reading references this seems like a favorable privacy/variance trade (calculation here).

### Why indistinguishability is enough for machine learning

The reason indistinguishability is enough to drive good machine learnings results (though it is not be enough to establish differential privacy) is given by the following argument. Suppose we split our data into three sets: Calibration, Training, and Test. And then we perform one of the two training procedures:

- Build our effect codes on bootstrapped Calibration, train the model on Train, and score our model on Test.
- Build our effect codes on bootstrapped Train, train the model on Train, and score our model on Test.

The first method is correct with or without the bootstrap (the same data isn’t use for both variable design and training, so we don’t have the nested model problem). If the second method is indistinguishable from the first, then the second method would also be correct, showing the possibility of working without the Calibration set.

## Conclusion

It may be possible to replace Laplace noise methods with re-sized Bootstrap resampling *in* various indistinguishability establishing algorithms. Obviously bootstrapping is a fairly standard technique and others have noted its relation to differential privacy. Examples include:

- “A Bootstrap Mechanism for Response Masking in Remote Analysis Systems”, Krishnamurty Muralidhar, Christine M. O’Keefe, and Rathindra Sarathy, Article first published online: 14 SEP 2015, Decision Sciences DOI: 10.1111/deci.12168 [link]
- “Differential privacy based on importance weighting.”, Ji Z, Elkan C.; Machine learning. 2013;93(1):163-183. doi:10.1007/s10994-013-5396-x [link]
- “Differential Privacy in a Bayesian setting through posterior sampling”, Christos Dimitrakakis, Blaine Nelson, and Zuhe Zhang, Aikaterini Mitrokotsa, Benjamin Rubinstein, 2013 [link]

But perhaps the bootstrapping method (especially with the change of set size) still deserves to be used more often.

Categories: Exciting Techniques Statistics

### jmount

Data Scientist and trainer at Win Vector LLC. One of the authors of Practical Data Science with R.

Excellent insight!

>> The frequency estimate is a random variable with variance (1/Z)/(n/Z)^2 = Z/n^2.

How is that?

The count random variable was mean (1/Z) variance (1/Z) (standard properties of the Poisson distribution). To convert counts (call them x) to frequencies we divide by (n/Z) (the size of the Bootstrap sets we are counting over). Variance is E[(x-mean(x))^2]. E[(x/(n/Z)-mean(x/(n/Z))^2] = E[(x-mean(x))^2 / (n/Z)^2] = (Z/n)^2 E[(x-mean(x))^2] = (Z/n)^2 (1/Z) = Z/n^2.

I’ve added some more calculation details here: https://github.com/WinVector/Examples/blob/master/DiffPriv/PrivStep/DiffPrivEsts.ipynb

If you look at the probability that your approach produces a count of one, given either an all-zeros dataset (prob = 0) or a all-but-one-zeros dataset (prob > 0) the ratios end up being not bounded by exp(epsilon), which is a requirement for epsilon-differential privacy. While you have a rare event (prob ~ 1/Z) you must take the ratio between this and the alternate probability (prob = 0) rather than their difference.

Ah, that is where we went wrong. We got so caught up in the calculation we ignored the definition. Thank you for the correction, I am going to update the note to talk about what we had been thinking of (but failed to write about): indistinguishability.