jax_privacy.batch_selection.CyclicPoissonSampling
- class jax_privacy.batch_selection.CyclicPoissonSampling(sampling_prob, iterations, truncated_batch_size=None, cycle_length=1, partition_type=PartitionType.EQUAL_SPLIT)[source]
Bases:
BatchSelectionStrategyImplements Poisson sampling, possibly with cyclic sampling and truncation.
This generalizes several common sampling strategies [1,2,3,4].
References: [1] https://arxiv.org/abs/2211.06530 [2] https://arxiv.org/abs/1607.00133 [3] https://arxiv.org/abs/2306.08153 [4] https://arxiv.org/abs/2411.04205
- Formal guarantees of the batch_iterator:
All batches consist of indices in the range [0, num_examples).
Each example only appears in batches with index i such that i % cycle_length == j for some fixed j per example.
Without truncation, every index independently appears in each batch (where it is eligible to participate subject to the previous guarantee) with probability sampling_prob.
With truncation, if > truncated_batch_size examples appear in a batch under the previous guarantee, then we select truncated_batch_size of them uniformly at random and discard the rest.
If even_partition = True, num_examples % cycle_length examples are discarded, i.e. never sampled.
- Variables:
sampling_prob – The probability of sampling an example in rounds when it is eligible to participate. To achieve an average batch size of expected_batch_size, one should ideally set sampling_prob = expected_batch_size / (num_examples // cycle_length).
iterations – The number of total iterations / batches to generate.
truncated_batch_size – If True, after Poisson sampling, if we have more than truncated_batch_size examples in a batch, we uniformly sample truncated_batch_size of them and discard the rest.
cycle_length – If > 1, we use cyclic Poisson sampling: we partition the examples into cycle_length groups, and do Poisson sampling from the groups in a round-robin fashion. cycle_length == 1 retrieves standard Poisson sampling.
partition_type – How to partition the examples into groups for before Poisson sampling. EQUAL_SPLIT is the default, and is only compatible with zero-out and replace-one adjacency notions, while INDEPENDENT is compatible with the add-remove adjacency notion.
Methods
__init__Yields 1D batches of data indices.
Attributes
-
sampling_prob:
float
-
iterations:
int
-
truncated_batch_size:
int|None= None
-
cycle_length:
int= 1
-
partition_type:
PartitionType= 2