Keras

This part of the documentation describes how to train a Keras model in PEDL. For Keras models, there are two categories of model definitions. The Standard Model Definition provides finer-grained control over PEDL model construction and computation. The Simple Model Definition requires very few code changes to set up and may be the simplest if you are new to PEDL.

Standard Model Definition

There are two steps needed to define a Keras model in PEDL using a Standard Model Definition:

  1. Define a make_data_loaders() function. See Data Loading for more information.
  2. Implement one of the Keras trial interfaces: KerasTrial and KerasFunctionalTrial. The KerasTrial interface supports models that use the Keras Sequential API. The KerasFunctionalTrial interface supports models that use the Keras Functional API.

Simple Model Definition

If you have existing Keras code that uses fit_generator(), you may be able to use Keras Simple Model Definition. See the linked documentation for information about its requirements.

Data Loading

There are two supported data types for loading data into KerasTrial and KerasFunctionalTrial models: an object that implements the keras.utils.Sequence interface or a Python generator.

Sequences are recommended over generators for several reasons:

  • This will allow you to leverage multithreading and multiprocessing with a simple adapter.
  • With Python generators PEDL needs to save its state as part of checkpoints. This takes up more disk space and could be a point of failure if you are memory constrained or have non-pickleable objects as part of your generator state. Sequences are stateless and thus avoid this problem.

Loading data into KerasTrial and KerasFunctionalTrial models is done by defining a make_data_loaders() function. This function should return a pair of objects (one for training and one for validation) which either implement the Sequence interface or is a Python generator. The behavior of these data loaders should be familiar if you have used fit_generator. Just like in Keras, these Sequences and generators should return batches of data (i.e. either (inputs, targets) or (inputs, targets, sample_weights)). Internally, these fields get passed to Keras' train_on_batch. Examples can be found under KerasTrial and KerasFunctionalTrial.

Note

If you are using a generator for training data, the generator is expected to loop indefinitely, just like in Keras. If you are using a generator for validation data, the generator is expected to loop over the validation set exactly once.

Multithreading / Multiprocessing

We support multithreading and multiprocessing only for Sequences. This can be done by returning an instance of pedl.frameworks.keras.data.KerasDataAdapter as one of the data loaders in make_data_loaders(). KerasDataAdapter is a small abstraction that provides a way to define the parameters for multithreading / multiprocessing. Its behavior is similar to the data inputs to Keras' fit_generator().

Usage Examples

  • Use main Python process with no multithreading and no multiprocessing
    KerasDataAdapter(sequence, workers=0, use_multiprocessing=False)
    
  • Use one background process
    KerasDataAdapter(sequence, workers=1, use_multiprocessing=True)
    
  • Use two background threads
    KerasDataAdapter(sequence, workers=2, use_multiprocessing=False)
    

Arguments

  • sequence: A Sequence that holds the data.
  • use_multiprocessing: If True, use multiprocessing, else, use multithreading. If unspecified, use_multiprocessing will default to False. Note that because this implementation relies on multiprocessing, you should not pass non-pickleable arguments for the data loaders as they can't be passed easily to children processes.
  • workers: Maximum number of processes to create when using multiprocessing, otherwise it is the maximum number of threads. If unspecified, workers will default to 1. If 0, will execute the data loading on the main thread.
  • max_queue_size: Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.

KerasTrial Interface

Keras trials are created by subclassing the abstract class KerasTrial. The KerasTrial interface supports models that use the Keras Sequential API; to use the Keras Functional API, see KerasFunctionalTrial below.

Users must define the following abstract methods to create the deep learning model associated with a specific trial, and to subsequently train and evaluate it:

  • build_model(self, hparams): Defines the deep learning architecture associated with a trial, and typically depends on the trial's specific hyperparameter settings which are stored in the hparams dictionary. This function returns a keras.models.Sequential object.
  • optimizer(self): Specifies the learning algorithm, e.g., keras.optimizers.RMSProp or keras.optimizers.Adam.
  • loss(self): Specifies the loss associated with the objective function to be optimized, e.g., keras.losses.mean_squared_error or keras.losses.categorical_crossentropy.
  • batch_size(self): Specifies the batch size to use for training.
  • validation_metrics(self): Specifies the performance metrics that will be evaluated on the validation data. This function should return a dictionary that maps user-specified metric names to metrics. The metrics can take one of two forms:
    • The first form is a valid Keras metric function, which is a Python function that takes two TensorFlow tensors containing the predictions and labels, respectively, and returns a tensor result. The element-wise mean of this tensor result across all validation batches is saved as the metric value for a given validation step.
    • The second form is a pair of batch metric function and reducer function. The batch metric function is a valid Keras metric function as described above, and the reducer is run on the collected results. An example of a reducer function is provided in pedl.util.elementwise_mean; this is the default reduction function used if a metric is specified without a reducer function. This second form is useful if it is desirable to overwrite the default reduction procedure.

Optional Methods

  • training_metrics(self): Specifies performance metrics that will be evaluated on each batch of training data. Training loss is always computed and reported as a metric named loss. If supplied, this function defines a set of metrics to be computed in addition to the training loss. This function should return a dictionary that maps user-specified metric names to metric functions. A training metric function is a Python function that takes two TensorFlow tensors and returns a JSON-serializable object (e.g., a floating point value). Users can supply custom metric functions or use one of the built-in Keras metrics. Since the training metrics are evaluated on every batch, we recommend only including metrics that are computed as part of the forward pass of training, e.g., keras.metrics.categorical_accuracy.
  • session_config(self): Specifies the tf.ConfigProto to be used by the TensorFlow session. By default, tf.ConfigProto(allow_soft_placement=True) is used.

Examples

KerasFunctionalTrial Interface

The KerasFunctionalTrial interface is designed to support the Keras Functional API. This interface is appropriate for complex models that may require multiple inputs, multiple loss functions, and/or multiple outputs. The interface is similar to the KerasTrial interface with a few significant differences:

  • build_model(self, hparams): Defines the deep learning architecture associated with a trial, and typically depends on the trial's specific hyperparameter settings which are stored in the hparams dictionary. This function returns a keras.models.Model object. All output layers and input layers should be explicitly named so they can be referenced in the losses, training_metrics, and validation_metrics methods.
  • optimizer(self): Specifies the learning algorithm, e.g., keras.optimizers.RMSProp or keras.optimizers.Adam.
  • losses(self): Specifies the loss(es) associated with the objective function to be optimized, e.g., keras.losses.mean_squared_error or keras.losses.categorical_crossentropy. This function should return a dict where the keys are output layer names and the values are Keras loss functions.
  • batch_size(self): Specifies the batch size to use for training.
  • validation_metrics(self): Specifies the performance metrics that will be evaluated on the validation data. This function should return a dictionary that maps user-specified metric names to tuples of length 2 or 3, e.g.:

    {
        "metric1_name": ("output_layer": str,
                         metric1_operation: MetricOp),
        "metric2_name": ("output_layer": str,
                         metric2_operation: MetricOp,
                         metric2_reducer: Reducer),
        ...
    }
    
    The first element of the tuple is the string name of the Keras output layer the metric should be evaluated on. The second element of the tuple is a valid Keras metric function, which is a Python function that takes two TensorFlow tensors containing the predictions and labels, respectively, and returns a tensor result. The third and optional element of the tuple is a reducer function that defines how the per-batch values of each metric are reduced to a single value. An example of a reducer function is provided in pedl.util.elementwise_mean; this is the default reduction function used if a metric is specified without a reducer function.

    Note

    When a metric is specified on an output layer that doesn't have a loss function, PEDL will follow the behavior of Keras and ignore the metric function.

Optional Methods

  • training_metrics(self): Specifies performance metrics that will be evaluated on each batch of training data. Total training loss is always computed as the sum of all specified losses and reported as a metric named loss. If supplied, this function defines a set of metrics to be computed in addition to the training loss. This function should return a dictionary that maps user-specified metric names to 2-tuples of output layer name and metric function. A layer name is a string containing the name of an output layer in the model. A metric function is a Python function that takes two TensorFlow tensors and returns a JSON-serializable object (e.g., a floating point value). Users can supply custom metric functions or use one of the built-in Keras metrics. Since the training metrics are evaluated on every batch, we recommend only including metrics that are computed as part of the forward pass of training, e.g., keras.metrics.categorical_accuracy.
  • session_config(self): Specifies the tf.ConfigProto to be used by the TensorFlow session. By default, tf.ConfigProto(allow_soft_placement=True) is used.

Examples

Keras Simple Model Definition

To use a simple model definition with Keras, specify an entrypoint section in the experiment configuration, where script is set to the location of the entrypoint script relative to the model definition directory. Optionally, specify a list of arguments to be passed to the entrypoint script under args.

Please ensure that your model definition conforms to the following requirements:

  • The model is trained using the fit_generator() API during execution of the entrypoint script. All the same argument requirements to fit_generator() apply in PEDL, except as follows.
    • steps_per_epoch and epochs are ignored if provided. Instead, the searcher section in the experiment configuration defines how long the model will be trained for.
    • validation_data must be specified as a generator.
    • validation_steps must be specified unless the validation generator is of type keras.utils.Sequence.
      • In the case that validation_steps is unspecified and validation_data is of type keras.utils.Sequence, then len(validation_data) will be used as validation_steps. This mimics the behavior of the Keras fit_generator() API.
      • A PEDL validation step will use validation_steps batches to compute validation metrics.
    • Code cannot rely on the return value or side effects of fit_generator().
    • Certain types of callbacks may not be supported—see Callbacks below for more details.
  • Any training generator or validation generator used must not reference non-pickleable objects, including threading.Lock and file objects. One exception to this rule is Keras' ImageDataGenerator, which contains a threading.Lock instance that is specially handled by PEDL.

An example is provided at examples/mnist_keras_simple.

Callbacks

The following is a non-exhaustive list of supported Keras callbacks:

  • LearningRateScheduler

    The first argument to the schedule function will be interpreted as a PEDL step ID instead of an epoch index. Note that a PEDL step ID is 1-based, as opposed to the 0-based epoch index used by Keras. The learning rate will be applied to the optimizer before the training step is executed. For example, the following code uses a learning rate of 0.01 for the first 10 training steps and a learning rate of 0.001 for the rest of training.

    def lr_schedule(step_id: int) -> float:
        if step_id <= 10:
            return 0.01
        else:
            return 0.001
    
    fit_generator(
        ...
        callbacks = [LearningRateScheduler(schedule=lr_schedule)],
        ...
    )
    
  • Validation Metric Callbacks

    The Keras metric API makes it difficult to compute unbatched metrics, such as mAP. One workaround is to pass in a reference to validation data in a Keras callback and compute the metric in on_epoch_end(), as demonstrated by this Github issue. To integrate this workaround into PEDL, make sure your callback inherits from pedl.frameworks.keras.KerasValidationCallback, and add the computed metric value to the logs argument of on_epoch_end(). This will indicate to PEDL that this callback should be run during a validation step instead of during a training step. An example callback that computes the Mean Absolute Error (MAE) is provided below:

    from pedl.frameworks.keras import KerasValidationCallback
    
    class ComputeMAEMetricCallback(KerasValidationCallback):
        def __init__(self, validation_gen) -> None:
            super().__init__()
            self.validation_gen = validation_gen
    
        def on_epoch_end(self, epoch: int, logs: Dict[str, Any]) -> None:
            data, labels = next(self.validation_gen)
            predictions = self.model.predict(data)
            predictions = np.squeeze(predictions)
            mae = np.sum(np.abs(predictions - labels))
            logs["mae"] = mae
    
    ...
    
    model.fit_generator(
        ...
        callbacks=[ComputeMAEMetricCallback(validation_gen)]
    )
    
  • TensorBoard

    If using the TensorBoard callback, the update_freq argument will be ignored and PEDL will serialize the metrics at the end of every training and validation step. All metrics will be serialized following the metric name conventions used by Keras ("val_" is prepended to the validation metric names).

  • ReduceLROnPlateau

    When ReduceLROnPlateau is used as part of a Keras simple model definition, it adheres to the following semantics: If the monitor argument is set to monitor a training metric, the patience and cooldown arguments refer to the number of training steps instead of number of epochs. If the monitor argument is set to track a validation metric (any metric prefixed with "val_"), the patience and cooldown arguments refer to the number of validation steps instead of number of epochs. If using ReduceLROnPlateau to track a validation metric, it is recommended to set a min_validation_period to keep the schedule of validation steps at evenly paced intervals.

Please reach out to the Determined AI team for more information on whether a Keras callback you are using is supported.