Determined aims to support reproducible machine learning experiments: that is, the result of running a Determined experiment should be deterministic, so that rerunning a previous experiment should produce an identical model. For example, this ensures that if the model produced from an experiment is ever lost, it can be recovered by rerunning the experiment that produced it.
The current version of Determined provides limited support for reproducibility; unfortunately, the current state of the hardware and software stack typically used for deep learning makes perfect reproducibility very challenging.
Determined can control and reproduce the following sources of randomness:
Hyperparameter sampling decisions.
The initial weights for a given hyperparameter configuration.
Shuffling of training data in a trial.
Dropout or other random layers.
Determined currently does not offer support for:
Controlling non-determinism in floating-point operations. Modern deep learning frameworks typically implement training using floating point operations that result in non-deterministic results, particularly on GPUs. If only CPUs are used for training, reproducible results can be achieved—see below.
Each Determined experiment is associated with an experiment seed: an
integer ranging from 0 to 231–1. The experiment seed can be
set using the
reproducibility.experiment_seed field of the
experiment configuration. If an experiment seed is not explicitly
specified, the master will assign one automatically.
The experiment seed is used as a source of randomness for any hyperparameter sampling procedures. The experiment seed is also used to generate a trial seed for every trial associated with the experiment.
Trial interface, the trial seed is accessible within the
trial class using
To achieve reproducible initial conditions in an experiment, please follow these guidelines:
Use the trial seed to seed any randomized operations (e.g., initializers, dropout) in your framework of choice. For example, Keras initializers accept an optional seed parameter. Again, it is not necessary to set any graph-level PRNGs (e.g., TensorFlow’s
tf.set_random_seed), as Determined manages this for you.
Deterministic Floating Point on CPUs¶
When doing CPU-only training with TensorFlow, it is possible to achieve
floating-point reproducibility throughout optimization. If using the
TFKerasTrial API, implement the optional
session_config() method to override
the default session configuration:
def session_config(self) -> tf.ConfigProto: return tf.ConfigProto( intra_op_parallelism_threads=1, inter_op_parallelism_threads=1 )
Disabling thread parallelism may negatively affect performance. Only enable this feature if you understand and accept this trade-off.
TensorFlow does not fully support the extraction or restoration of a single, global RNG state. Consequently, pausing experiments that use a TensorFlow-based framework may introduce an additional source of entropy.