Shortcuts

determined.keras

determined.keras.TFKerasTrial

class determined.keras.TFKerasTrial(context: determined.keras._tf_keras_context.TFKerasTrialContext)

tf.keras trials are created by subclassing this abstract class.

Users must define all the abstract methods to create the deep learning model associated with a specific trial, and to subsequently train and evaluate it.

By default, experiments run with TensorFlow 1.x. To configure your trial to use TensorFlow 2.x, set a TF 2.x image in the experiment configuration (e.g. determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.2-gpu-0.5.0).

By default, trials using TF 2.x use eager execution and trials using TF 1.x do not. If you want to override the default, you must call the appropriate function in the __init__. For example, if you want to disable eager execution while running a TF 2.x trial, call tf.compat.v1.disable_eager_execution at the top of your __init__ function.

trial_context_class

alias of determined.keras._tf_keras_context.TFKerasTrialContext

__init__(context: determined.keras._tf_keras_context.TFKerasTrialContext) → None

Initializes a trial using the provided context.

This method should typically be overridden by trial definitions: at minimum, it is important to store context as an instance variable so that it can be accessed by other methods of the trial class. This can also be a convenient place to initialize other state that is shared between methods.

abstract build_model() → tensorflow.python.keras.engine.training.Model

Defines the deep learning architecture associated with a trial. The architecture might depend on the current values of the model’s hyperparameters, which can be accessed via context.get_hparam(). This function returns a tf.keras.Model object. Users must compile this model by calling model.compile() on the tf.keras.Model instance before it is returned.

abstract build_training_data_loader() → Union[tensorflow.python.keras.utils.data_utils.Sequence, tensorflow.python.data.ops.dataset_ops.DatasetV1, determined.keras._data.SequenceAdapter, tuple]

Defines the data loader to use during training.

Should return one of the following:

1) A tuple (x_train, y_train), where x_train is a NumPy array (or array-like), a list of arrays (in case the model has multiple inputs), or a dict mapping input names to the corresponding array, if the model has named inputs. y_train should be a NumPy array.

2) A tuple (x_train, y_train, sample_weights) of NumPy arrays.

3) A tf.data.Dataset returning a tuple of either (inputs, targets) or (inputs, targets, sample_weights).

4) A keras.utils.Sequence returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).

5) A determined.keras.SequenceAdapter returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).

Warning

If you are using tf.data.Dataset, Determined’s support for automatically checkpointing the dataset does not currently work correctly. This means that resuming workloads will start from the beginning of the dataset if using tf.data.Dataset.

abstract build_validation_data_loader() → Union[tensorflow.python.keras.utils.data_utils.Sequence, tensorflow.python.data.ops.dataset_ops.DatasetV1, determined.keras._data.SequenceAdapter, tuple]

Defines the data loader to use during validation.

Should return one of the following:

1) A tuple (x_val, y_val), where x_val is a NumPy array (or array-like), a list of arrays (in case the model has multiple inputs), or a dict mapping input names to the corresponding array, if the model has named inputs. y_val should be a NumPy array.

2) A tuple (x_val, y_val, sample_weights) of NumPy arrays.

3) A tf.data.Dataset returning a tuple of either (inputs, targets) or (inputs, targets, sample_weights).

4) A keras.utils.Sequence returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).

5) A determined.keras.SequenceAdapter returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).

session_config() → tensorflow.core.protobuf.config_pb2.ConfigProto

Specifies the tf.ConfigProto to be used by the TensorFlow session. By default, tf.ConfigProto(allow_soft_placement=True) is used.

keras_callbacks() → List[tensorflow.python.keras.callbacks.Callback]

Specifies a list of tf.keras.callback.Callback objects to be used during the trial’s lifetime.

Callbacks should avoid calling model.predict(), as this will affect Determined training behavior.

Data Loading

There are five supported data types for loading data into tf.keras models:

  1. A tuple (x, y) of Numpy arrays. x must be a Numpy array (or array-like), a list of arrays (in case the model has multiple inputs), or a dict mapping input names to the corresponding array, if the model has named inputs. y should be a numpy array.

  2. A tuple (x, y, sample_weights) of Numpy arrays.

  3. A tf.data.dataset returning a tuple of either (inputs, targets) or (inputs, targets, sample_weights).

  4. A keras.utils.Sequence returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).

  5. A det.keras.SequenceAdapter returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).

Loading data is done by defining build_training_data_loader and build_validation_data_loader functions. Each should return one of the supported data types mentioned above.

Optimizing Keras Sequences

To optimize performance of tf.keras.Sequence which are created from generators, Determined provides determined.keras.SequenceAdapter.

class determined.keras.SequenceAdapter(data: tensorflow.python.keras.utils.data_utils.Sequence, use_multiprocessing: bool = False, workers: int = 1, max_queue_size: int = 10)

A class to assist to optimize performance of tf.keras.sequence and help with restoring and saving iterators for a dataset.

__init__(data: tensorflow.python.keras.utils.data_utils.Sequence, use_multiprocessing: bool = False, workers: int = 1, max_queue_size: int = 10)

Multiprocessing or multithreading for native Python generators is not supported. If you want these performance accelerations, please consider using a Sequence.

Parameters
  • sequence – A tf.keras.utils.Sequence that holds the data.

  • use_multiprocessing – If True, use process-based threading. If unspecified, use_multiprocessing will default to False. Note that because this implementation relies on multiprocessing, you should not pass non-picklable arguments for the data loaders as they can’t be passed easily to children processes.

  • workers – Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1. If 0, will execute the data loading on the main thread.

  • max_queue_size – Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.

Usage Examples

  • Use main Python process with no multithreading and no multiprocessing

SequenceAdapter(sequence, workers=0, use_multiprocessing=False)
  • Use one background process

SequenceAdapter(sequence, workers=1, use_multiprocessing=True)
  • Use two background threads

SequenceAdapter(sequence, workers=2, use_multiprocessing=False)

Required Wrappers

Users are required wrap their model prior to compiling it using the self.context.wrap_model. This is typically done inside determined.keras.TFKerasTrial.build_model().

determined.keras.TFKerasTrialContext.wrap_model(self, model: Any) → Any

This should be used to wrap tf.keras.Model objects immediately after they have been created but before they have been compiled. This function takes a tf.keras.Model and returns a wrapped version of the model; the return value should be used in place of the original model.

Parameters

model – tf.keras.Model

If using tf.data.Dataset, users are required to wrap both their training and validation dataset in a Determined-provided wrapper. This wrapper is used to shard the dataset for Distributed Training. For optimal performance, users should wrap dataset immediately after creating it.

determined.keras.TFKerasContext.wrap_dataset(self, dataset: Any) → Any

This should be used to wrap tf.data.Dataset objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for testing), users should wrap each dataset independently.

Parameters

dataset – tf.data.Dataset

Trial Context

determined.keras.TFKerasTrialContext subclasses determined.TrialContext. It provides useful methods for writing Trial subclasses. It also provides the model and dataset wrappers.

class determined.keras.TFKerasTrialContext(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)

TFKerasTrialContext always has a DistributedContext accessible via context.distributed for information related to distributed training.

TFKerasTrialContext always has a TFKerasExperimentalContext accessible via context.experimental for information related to experimental features.

wrap_model(model: Any) → Any

This should be used to wrap tf.keras.Model objects immediately after they have been created but before they have been compiled. This function takes a tf.keras.Model and returns a wrapped version of the model; the return value should be used in place of the original model.

Parameters

model – tf.keras.Model

wrap_dataset(dataset: Any) → Any

This should be used to wrap tf.data.Dataset objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for testing), users should wrap each dataset independently.

Parameters

dataset – tf.data.Dataset

class determined.keras.TFKerasExperimentalContext(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)

Context class that contains experimental runtime information and features for any Determined workflow that uses the tf.keras API.

TFKerasExperimentalContext extends EstimatorTrialContext under the context.experimental namespace.

cache_train_dataset(dataset_id: str, dataset_version: str, shuffle: bool = False, skip_shuffle_at_epoch_end: bool = False) → Callable

cache_train_dataset is a decorator for creating your training dataset. It should decorate a function that outputs a tf.data.Dataset object. The dataset will be stored in a cache, keyed by dataset_id and dataset_version. The cache is re-used in subsequent calls.

Parameters
  • dataset_id – A string that will be used as part of the unique identifier for this dataset.

  • dataset_version – A string that will be used as part of the unique identifier for this dataset.

  • shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.

  • skip_shuffle_at_epoch_end – A bool indicating if shuffling should be skipped at the end of epochs.

Example Usage:

def make_train_dataset(self):
    @self.context.experimental.cache_train_dataset("range_dataset", "v1")
    def make_dataset():
        ds = tf.data.Dataset.range(10)
        return ds

    dataset = make_dataset()
    dataset = dataset.batch(self.context.get_per_slot_batch_size())
    dataset = dataset.map(...)
    return dataset

Note

dataset.batch() and runtime augmentation should be done after caching. Additionally, users should never need to call dataset.repeat().

cache_validation_dataset(dataset_id: str, dataset_version: str, shuffle: bool = False) → Callable

cache_validation_dataset is a decorator for creating your validation dataset. It should decorate a function that outputs a tf.data.Dataset object. The dataset will be stored in a cache, keyed by dataset_id and dataset_version. The cache is re-used in subsequent calls.

Parameters
  • dataset_id – A string that will be used as part of the unique identifier for this dataset.

  • dataset_version – A string that will be used as part of the unique identifier for this dataset.

  • shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.

Native

Disregard if using the trial API (subclassing determined.keras.TFKerasTrial).

determined.keras.TFKerasTensorBoard

class determined.keras.TFKerasTensorBoard(*args: Any, **kwargs: Any)

This is a thin wrapper over the TensorBoard callback that ships with tf.keras. For more information, see the TensorBoard Guide or the upstream docs for tf.keras.callbacks.TensorBoard.

Note that if a log_dir argument is passed to the constructor, it will be ignored.

determined.experimental.keras.init()

determined.experimental.keras.init(config: Optional[Dict[str, Any]] = None, local: bool = False, test: bool = False, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → determined.keras._tf_keras_context.TFKerasNativeContext

Create a tf.keras experiment using the Native API.

Parameters
  • config – A dictionary representing the experiment configuration to be associated with the experiment.

  • local – A boolean indicating if training will happen locally. When False, the experiment will be submitted to the Determined cluster. Defaults to False.

  • test – A boolean indicating if the experiment should be shortened to a minimal loop of training, validation, and checkpointing. test=True is useful quick iterating during model porting or debugging. Defaults to False.

  • context_dir

    A string filepath that defines the context directory. All model code will be executed with this as the current working directory.

    When local=False, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.

    When local=True, this argument is optional and assumed to be the current working directory by default.

  • command – A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a python script, this argument is inferred to be sys.argv by default. When executing this function via IPython or Jupyter notebook, this argument is required.

  • master_url – An optional string to use as the Determined master URL when local=False. If not specified, will be inferred from the environment variable DET_MASTER.

Returns

determined.keras.TFKerasNativeContext