jax_privacy.matrix_factorization.toeplitz.optimize_coefs_for_amplifications

jax_privacy.matrix_factorization.toeplitz.optimize_coefs_for_amplifications(n, *, dataset_size, expected_batch_size, epsilon, delta, max_optimizer_steps=250, reduction_fn=<function mean>)[source]

Select num_bands (and coefs) to minimize loss subject to a privacy target.

Following Theorem 4 of https://arxiv.org/abs/2306.08153, this function (approximately) minimizes the loss_fn assuming privacy amplification under block-cyclic Poisson sampling (Algorithm 2 of https://arxiv.org/abs/2306.08153). A smaller number of bands allows more benefit from amplification, while a larger number of bands allows more benefit from correlated noise.

Notes

  • This function only optimizes over numbers of bands that evenly divide n,

    as this is generally preferable. Hence, it is recommended to choose n so it has well spaced factors; powers of 2 are particularly useful.

  • This function delegates to optimize_banded_toeplitz to actually

    optimize for the coefficients at a given number of bands. Hence, column normalization is not directly supported, but the final returned strategy can always be used with column normalization.

Parameters:
  • n (int) – the number of iterations that defines the workload.

  • dataset_size (int) – The size of the dataset.

  • expected_batch_size (int) – The target batch size (so for example if we were Poisson sampling from the whole dataset, the sampling probability would be expected_batch_size / dataset_size).

  • epsilon (float) – The privacy target is (epsilon, delta)-DP.

  • delta (float) – The privacy target is (epsilon, delta)-DP.

  • max_optimizer_steps (int) – The maximum number of LBFGS iterations, passed to optimize_banded_toeplitz.

  • reduction_fn (Callable[[Array], Array]) – A function that converts per query squared errors to a scalar. Use jnp.mean to optimize mean-squared-error, jnp.max to optimize max squared error, or lambda v: v[-1] to optimize last iterate squared error.

Returns:

  • coefs are the coefficeints of a banded Toeplitz strategy; the number

    of bands chosen is simply the length of the returned coefficients.

  • stddev is the stddev of the uncorrelated noise Z required to achieve the privacy target (that, is, passing this stddev to streaming_matrix_to_single_machine_privatizer in distributed_noise_generation should achieve the (epsilon, delta)-DP guarantee).

Return type:

A tuple (coefs, stddev) where