# Hyperparameter Search: Population-based training¶

Population-based training (PBT) is loosely based on genetic algorithms; see the original paper or blog post for details. The motivation is that it makes sense to explore hyperparameter configurations that are known to perform well, since the performance of a model as a function of the hyperparameters is likely to show some continuity. The algorithm works by repeatedly replacing low-performing hyperparameter configurations with modified versions of high-performing ones.

## Quick start¶

A typical set of configuration values for PBT:

`population_size`

: 40`num_rounds`

,`steps_per_round`

: The product of these values is the total number of steps that a trial that survives to the end of the experiment will be trained for; it should be chosen similarly to the value of`target_trial_steps`

for adaptive search. For a given value of the product, decreasing`steps_per_round`

creates more opportunity for evaluation and selection of good configurations at the cost of higher variance and computational overhead.`replace_function`

:`truncate_fraction`

: 0.2

`explore_function`

:`resample_probability`

: 0.2`perturb_factor`

: 0.2

## Details¶

At any time, the searcher maintains a fixed number of active trials (the *population*). Initially, each trial uses a randomly chosen hyperparameter configuration, just as with the `random`

searcher. The difference is that, periodically, every trial stops training and evaluates the validation metric for the trial's current state; some of the worst-performing trials are closed, while an equal number of the best-performing trials are *cloned* to replace them. Cloning a trial involves checkpointing it and creating a new trial that continues training from that checkpoint. The hyperparameters of the new trial are not generally equal to those of the original trial, but are derived from them in a particular way; see the description of available parameters for details.

There is an important constraint on the hyperparameters that are allowed to vary when PBT is in use: it must always be possible to load a checkpoint from a model that was created with any potential hyperparameter configuration into a model using any other configuration; otherwise, the cloning process could fail. This means that, for instance, the number of hidden units in a neural network layer cannot be such a hyperparameter. If it were, the models for different configurations could have weight matrices of different dimensions, so their checkpoints would not be compatible.

## Parameters¶

One *round* consists of a period of training followed by a validate/close/clone phase. During each round, each running trial does a fixed amount of training, determined by the experiment configuration.

`population_size`

: The number of trials that should run at the same time.`num_rounds`

: The total number of rounds to run.`steps_per_round`

: The number of training steps for each trial to run during each round.

The parameters for the cloning process are also configurable using two nested objects, called `replace_function`

and `explore_function`

, within the searcher fields of the experiment configuration file.

`replace_function`

: The configuration for deciding which trials to close.`truncate_fraction`

: The fraction of the population that is closed and replaced by clones at the end of each round.

`explore_function`

: The configuration for modifying hyperparameter configurations when cloning. Each hyperparameter is either*resampled*, meaning that it is replaced by a value drawn independently from the original configuration, or*perturbed*, meaning that it is multiplied by a configurable factor.`resample_probability`

: The probability that a hyperparameter is replaced with a new value sampled from the original distribution specified in the configuration.`perturb_factor`

: The amount by which hyperparameters that are not resampled are perturbed: each numerical hyperparameter is multiplied by either`1 + perturb_factor`

or`1 - perturb_factor`

with equal probability;`categorical`

and`const`

hyperparameters are left unchanged.