jax_privacy.matrix_factorization.toeplitz.optimize_coefs_for_amplifications

jax_privacy.matrix_factorization.toeplitz.optimize_coefs_for_amplifications(n, *, dataset_size, expected_batch_size, epsilon, delta, max_optimizer_steps=250, reduction_fn=<function mean>)[source]

Select num_bands (and coefs) to minimize loss subject to a privacy target.

Following Theorem 4 of https://arxiv.org/abs/2306.08153, this function (approximately) minimizes the loss_fn assuming privacy amplification under block-cyclic Poisson sampling (Algorithm 2 of https://arxiv.org/abs/2306.08153). A smaller number of bands allows more benefit from amplification, while a larger number of bands allows more benefit from correlated noise.

Notes

This function only optimizes over numbers of bands that evenly divide n,
as this is generally preferable. Hence, it is recommended to choose n so it has well spaced factors; powers of 2 are particularly useful.
This function delegates to optimize_banded_toeplitz to actually
optimize for the coefficients at a given number of bands. Hence, column normalization is not directly supported, but the final returned strategy can always be used with column normalization.

Parameters:

n (int) – the number of iterations that defines the workload.
dataset_size (int) – The size of the dataset.
expected_batch_size (int) – The target batch size (so for example if we were Poisson sampling from the whole dataset, the sampling probability would be expected_batch_size / dataset_size).
epsilon (float) – The privacy target is (epsilon, delta)-DP.
delta (float) – The privacy target is (epsilon, delta)-DP.
max_optimizer_steps (int) – The maximum number of LBFGS iterations, passed to optimize_banded_toeplitz.
reduction_fn (Callable[[Array], Array]) – A function that converts per query squared errors to a scalar. Use jnp.mean to optimize mean-squared-error, jnp.max to optimize max squared error, or lambda v: v[-1] to optimize last iterate squared error.

Returns:

coefs are the coefficeints of a banded Toeplitz strategy; the number
of bands chosen is simply the length of the returned coefficients.
stddev is the stddev of the uncorrelated noise Z required to achieve the privacy target (that, is, passing this stddev to streaming_matrix_to_single_machine_privatizer in distributed_noise_generation should achieve the (epsilon, delta)-DP guarantee).

Return type:

A tuple (coefs, stddev) where