Reproducibility#
Determined aims to support reproducible machine learning experiments: that is, the result of running a Determined experiment should be deterministic, so that rerunning a previous experiment should produce an identical model. For example, if the model produced from an experiment is ever lost, it can be recovered by rerunning the experiment that produced it.
Status#
The current version of Determined provides limited support for reproducibility; unfortunately, the hardware and software stack typically used for deep learning makes perfect reproducibility very challenging.
Determined can control and reproduce the following sources of randomness:
Hyperparameter sampling decisions.
The initial weights for a given hyperparameter configuration.
Shuffling of training data in a trial.
Dropout or other random layers.
Determined currently does not offer support for controlling non-determinism in floating-point operations. Modern deep learning frameworks typically implement training using floating point operations that result in non-deterministic results, particularly on GPUs. If only CPUs are used for training, reproducible results can be achieved, as described in the following sections.
Random Seeds#
Each Determined experiment is associated with an experiment seed: an integer ranging from 0 to
231–1. The experiment seed can be set using the reproducibility.experiment_seed
field
of the experiment configuration. If an experiment seed is not explicitly specified, the master will
assign one automatically.
The experiment seed is used as a source of randomness for any hyperparameter sampling procedures. The experiment seed is also used to generate a trial seed for every trial associated with the experiment.
In the Trial
interface, the trial seed is accessible within the trial class using
self.ctx.get_trial_seed()
.
Coding Guidelines#
To achieve reproducible initial conditions in an experiment, please follow these guidelines:
Use the np.random or random APIs for random procedures, such as shuffling of data. Both PRNGs will be initialized with the trial seed by Determined automatically.
Use the trial seed to seed any randomized operations (e.g., initializers, dropout) in your framework of choice. For example, Keras initializers accept an optional seed parameter. Again, it is not necessary to set any graph-level PRNGs (e.g., TensorFlow’s
tf.set_random_seed
), as Determined manages this for you.
Deterministic Floating Point on CPUs#
When doing CPU-only training with TensorFlow, it is possible to achieve floating-point
reproducibility throughout optimization. If using the TFKerasTrial
API,
implement the optional session_config()
method to override the
default session configuration:
def session_config(self) -> tf.ConfigProto:
return tf.ConfigProto(
intra_op_parallelism_threads=1, inter_op_parallelism_threads=1
)
Warning
Disabling thread parallelism may negatively affect performance. Only enable this feature if you understand and accept this trade-off.
Pausing Experiments#
TensorFlow does not fully support the extraction or restoration of a single, global RNG state. Consequently, pausing experiments that use a TensorFlow-based framework may introduce an additional source of entropy.