Skip to content

Overview

Model Definitions

The model definition is the interface between PEDL and the user's application framework (e.g., Keras, TensorFlow), in terms of loading training data, describing a model architecture, and specifying the underlying iterative optimization training algorithms. See the Defining Models chapter in the quick start guide for a brief introduction.

There are two kinds of model definitions:

  1. Standard Model Definition: Implement PEDL's provided Trial interface for your desired task. This option provides finer-grained control over PEDL model construction and computation.
  2. Simple Model Definition: Specify a directory of model code together with an entrypoint script that executes a training and validation procedure. This option requires very few code changes to set up and may be simplest if you're new to PEDL.

When the model definition is a directory, a .pedlignore file in the top level may optionally be used specify file or directory patterns to ignore. The .pedlignore file is expected to use the same syntax and pattern formatting as a .gitignore file.

Standard Model Definition

A standard model definition defines the interface between PEDL and user model code by implementing a framework specific Trial subclass. Users can provide these implementations either in a single file or in a directory containing a Python package, e.g., something importable containing a top-level __init__.py that exposes the Trial implementation. Unless the TensorFlow Estimator interface is used, the single file or Python package should also expose a make_data_loaders() implementation. examples/mnist_tf provides an example of a directory model definition. examples/cifar10_cnn_keras provides an example of a single file model definition.

PEDL currently supports five types of Trial interfaces encompassing three application frameworks:

Callbacks

Trial offers an optional interface to execute arbitrary Python functions before or after each training or validation step. This is useful for integrating with external systems, such as TensorBoard (see example below). To use callbacks in your experiment, implement the following optional interface in your Trial subclass:

  • callbacks(self, hparams): Returns a list of pedl.callback.Callback instances that will be used to run arbitrary Python functions during the lifetime of a PEDL trial. Callbacks are invoked in the order specified by this list.

The following predefined callbacks are provided by PEDL:

  • pedl.frameworks.tensorflow.TensorBoard(log_directory): log_directory specifies the container path where TensorBoard event logs will be written from the trial runner containers. The event logs for each trial will be saved under sub-directories under the log_directory labelled with the trial ID: <trial_id>/training and <trial_id>/validation for training and validation metrics, respectively. For a complete example, see TensorBoard Integration.
Custom Callbacks

To define custom callbacks, users may subclass pedl.callback.Callback and implement one or more of its optional interface functions:

  • on_trial_begin(): Executed before the start of the first training step of a trial.
  • on_train_step_begin(step_id): Executed at the beginning of a training step.
  • on_train_step_end(step_id, metrics): Executed at the end of a training step. metrics is a list of Python dictionaries for this training step, where each dictionary contains the metrics of a single training batch.
  • on_validation_step_begin(step_id): Executed at the beginning of a validation step.
  • on_validation_step_end(step_id, metrics): Executed at the end of a validation step. metrics is a Python dictionary that contains the metrics for this validation step.

Simple Model Definition

Simple model definitions provide a mechanism for running models in PEDL without needing to implement a Trial API. Instead, features like automatic checkpointing and task migration are implemented by intercepting method calls from the model code into the deep learning framework (e.g., Keras).

To create an experiment using a simple model definition, the experiment configuration file should specify an entrypoint section. The entrypoint script is the Python script that creates and loads the training data, describes a model architecture, and runs the training and validation procedure using framework API's (e.g., Keras's fit_generator()). PEDL will run the entrypoint script in a containerized trial runner environment and intercept framework calls to control the execution of the model training and validation. To access hyperparameters in model code, use the pedl.get_hyperparameter(name) function, where name is the string name of a hyperparameter as specified in the experiment configuration.

Currently, simple model definitions are only supported for Keras models.