Hyperparameter Search: Adaptive (Asynchronous)¶
adaptive_asha search method employs the same
underlying algorithm as the Adaptive (Advanced) method, but it uses an
asynchronous version of successive halving (ASHA), which is more suitable for
large-scale experiments with hundreds or thousands of trials.
Here are some suggested initial settings for
typically work well.
mode: Set to
max_length: The maximum training length (see Training Units) of any trial that survives to the end of the experiment. This quantity is domain-specific and should roughly reflect the number of minibatches the model must be trained on for it to converge on the data set. For users who would like to determine this number experimentally, train a model with reasonable hyperparameters using the
max_trials: This indicates the total number of hyperparameter settings that will be evaluated in the experiment. Set
max_trialsto at least 500 to take advantage of speedups from early-stopping. You can also set a large
max_trialsand stop the experiment once the desired performance is achieved.
max_concurrent_trials: This field controls the degree of parallelism of the experiment. The experiment will have a maximum of this many trials training simultaneously at any one time. The
adaptive_ashasearcher scales nearly perfectly with additional compute, so you should set this field based on compute environment constraints. If this value is less than the number of brackets produced by the adaptive algorithm, it will be rounded up.
Adaptive (ASHA) is an approximation to the resource allocation scheme used by Adaptive. While Adaptive promotes hyperparameter configurations synchronously, resulting in underutilized nodes waiting on completion of validation steps for other configurations, ASHA uses asynchronous promotions to maximize compute efficiency of the searcher.
See the difference in asynchronous vs. sychronous promotions in the two animated GIFs below: