Quick Start Chapter 0: Introduction¶
The PEDL quick start guide presents an overview of the core PEDL components and features. For more complete documentation, see the specific pages on each topic—links are provided throughout this guide.
The first order of business is to check that PEDL is set up and to establish some PEDL-specific background.
Make sure the PEDL command line interface is installed using the following command:
pip install pedl-*.whl
More information about using the PEDL CLI can be found with the command
pedl --help (or
pedl -h) after it is installed.
Configure the CLI to point the master to the correct IP address before issuing any commands. This can be accomplished by setting the
PEDL_MASTER_ADDR environmental variable:
export PEDL_MASTER_ADDR=<master IP>
A trial refers to one model with a fixed set of hyperparameter values; in particular, one trial trains a single model.
A training step is a fixed number of model updates. The number of updates in a step is determined by
batches_per_step in the experiment configuration file. PEDL uses steps to split up the work of training a model (which might take a very long time) into a collection of smaller operations.
PEDL is optimized to support user workflows through experiments. An experiment typically embodies a hyperparameter search algorithm and handles associated resource scheduling on clusters. A hyperparameter search algorithm looks for the best set of hyperparameter values for a model on a dataset, so it must search through the many associated models (represented in PEDL by trials, each with fixed hyperparameters). Experiments may contain many trials; they have the power to start, evaluate, continue training, or stop trials.
In terms of deployment, the PEDL system is split into components. The master centrally control experiments and their spawned trials. A slot in PEDL refers to a processor, typically a CPU or GPU. An agent is responsible for managing some slots, receiving workloads from master, and running workloads in containers on slots.
To create an experiment, the user must define two things:
Experiment configuration file: A YAML file that specifies metadata for the experiment, including the searcher type and parallelism. See QS2: hyperparameter search or the experiment configuration docs.
Model definition: Either a single
.py file or a directory of
.py files that specifies the neural network and related functions such as training and validation metrics. See QS4: defining models or the model definition docs.
The model definition typically also contains code for loading the model's training and validation data sets. For most model definitions, this is done by implementing a
make_data_loaders(experiment_config, hparams) function, which returns a PEDL-supported
DataLoader class. The arguments
hparams are read from the experiment configuration file by PEDL.
The rest of this quick start guide will give an introduction to these topics:
Training a model via the PEDL CLI using a provided example.
Specifying a hyperparameter search algorithm for the experiment through the experiment configuration file.
Using PEDL to run arbitrary commands.
Taking a deep learning model written using a framework such as TensorFlow and adapting it to run as a PEDL experiment.