Hyperparameter Search: Population-based training¶
Population-based training (PBT) is loosely based on genetic algorithms; see the original paper or blog post for details. The motivation is that it makes sense to explore hyperparameter configurations that are known to perform well, since the performance of a model as a function of the hyperparameters is likely to show some continuity. The algorithm works by repeatedly replacing low-performing hyperparameter configurations with modified versions of high-performing ones.
Quick start¶
A typical set of configuration values for PBT:
population_size
: 40num_rounds
,steps_per_round
: The product of these values is the total number of steps that a trial that survives to the end of the experiment will be trained for; it should be chosen similarly to the value oftarget_trial_steps
for Hyperparameter Search: Adaptive (Simple). For a given value of the product, decreasingsteps_per_round
creates more opportunity for evaluation and selection of good configurations at the cost of higher variance and computational overhead.replace_function
:truncate_fraction
: 0.2
explore_function
:resample_probability
: 0.2perturb_factor
: 0.2
Details¶
At any time, the searcher maintains a fixed number of active trials (the
population). Initially, each trial uses a randomly chosen
hyperparameter configuration, just as with the random
searcher. The
difference is that, periodically, every trial stops training and
evaluates the validation metric for the trial’s current state; some of
the worst-performing trials are closed, while an equal number of the
best-performing trials are cloned to replace them. Cloning a trial
involves checkpointing it and creating a new trial that continues
training from that checkpoint. The hyperparameters of the new trial are
not generally equal to those of the original trial, but are derived from
them in a particular way; see the description of available
parameters for details.
There is an important constraint on the hyperparameters that are allowed to vary when PBT is in use: it must always be possible to load a checkpoint from a model that was created with any potential hyperparameter configuration into a model using any other configuration; otherwise, the cloning process could fail. This means that, for instance, the number of hidden units in a neural network layer cannot be such a hyperparameter. If it were, the models for different configurations could have weight matrices of different dimensions, so their checkpoints would not be compatible.
Parameters¶
One round consists of a period of training followed by a validate/close/clone phase. During each round, each running trial does a fixed amount of training, determined by the experiment configuration.
population_size
: The number of trials that should run at the same time.num_rounds
: The total number of rounds to run.steps_per_round
: The number of training steps for each trial to run during each round.
The parameters for the cloning process are also configurable using two
nested objects, called replace_function
and explore_function
,
within the searcher fields of the experiment configuration file.
replace_function
: The configuration for deciding which trials to close.truncate_fraction
: The fraction of the population that is closed and replaced by clones at the end of each round.
explore_function
: The configuration for modifying hyperparameter configurations when cloning. Each hyperparameter is either resampled, meaning that it is replaced by a value drawn independently from the original configuration, or perturbed, meaning that it is multiplied by a configurable factor.resample_probability
: The probability that a hyperparameter is replaced with a new value sampled from the original distribution specified in the configuration.perturb_factor
: The amount by which hyperparameters that are not resampled are perturbed: each numerical hyperparameter is multiplied by either1 + perturb_factor
or1 - perturb_factor
with equal probability;categorical
andconst
hyperparameters are left unchanged.