torch_deterministic.BatchGenerator

class torch_deterministic.BatchGenerator(rngs)[source]

Bases: object

Wrap a collection of NumPy pseudorandom number generators (PRNGs) such that samples can easily be drawn from all of them at once.

Instantiate this class with a list of NumPy PRNGs. Then, any method invoked on this class will automatically be invoked on all of those PRNGs, and the results will collated into a PyTorch tensor. For example:

>>> from torch_deterministic import BatchGenerator
>>> bg = BatchGenerator([
...     np.random.default_rng(0),
...     np.random.default_rng(1),
... ])
>>> bg.uniform()
tensor([0.6370, 0.5118], dtype=torch.float64)

This class is meant to facilitate the idea that all of the randomness in each training step should come a PRNG seeded based on the index of the corresponding training example. This PRNG would be created by the dataset, used to build the training example, then returned in case the training loop itself requires any more randomness.

The benefit of this approach is that it’s very robust. The randomness does not depend on the number of data loader processes, and every training example can be reproduced without having to replay the whole dataset or constantly log the PRNG state. However, it’s worth noting that from the point-of-view of trying to get the best possible distribution of random numbers, this approach is suboptimal. PRNGs are only designed to output high-quality randomness if seeded once. There’s no guarantee that two PRNGs with different seeds won’t output correlated values. In practice, though, this doesn’t seem to be a significant issue.

The collate_rngs() function can be used to make PyTorch data loaders automatically wrap collections of NumPy PRNGs with this class.

Public Methods:

__init__(rngs)

__repr__()

Return repr(self).

__len__()

__getattr__(name)

pin_memory()


__getattr__(name)[source]
__init__(rngs)[source]
__len__()[source]
__repr__()[source]

Return repr(self).

pin_memory()[source]