Hyperparameter Tuning#

Hyperparameter tuning is the common machine learning process of selecting the data, features, model architecture, and learning algorithm to yield an effective model. Hyperparameter tuning is a challenging problem given the potentially large number of hyperparameters to consider.

Why Do Hyperparameters Matter?#

During the model development lifecycle, a machine learning engineer makes a wide range of decisions impacting model performance. For example, a computer vision model requires decisions on sample features, model architecture, and training algorithm parameters, e.g.:

Should we consider features aside from the raw images in the training set?
- Would synthetic data augmentation techniques like image rotation or horizontal flipping yield a better model?
- Should we populate additional features via advanced image processing techniques such as shape edge extraction?
What model architecture works best?
- How many layers?
- What kind of layers (e.g., dense, dropout, pooling)?
- How should we parameterize each layer (e.g., size, activation function)?
What learning algorithm hyperparameters should we use?
- What gradient descent batch size should we use?
- What optimizer should we use, and how should we parameterize it (e.g., learning rate)?

A machine learning engineer can manually guess and test hyperparameters, or they might narrow the search space by using a pretrained model. However, even if the engineer achieves seemingly good model performance, they’re left wondering how much better they might do with additional tuning.