determined.keras¶
determined.keras.TFKerasTrial
¶
-
class
determined.keras.
TFKerasTrial
(context: determined.keras._tf_keras_context.TFKerasTrialContext)¶ To implement a new
tf.keras
trial, subclass this class and implement the abstract methods described below (build_model()
,build_training_data_loader()
, andbuild_validation_data_loader()
). In most cases you should provide a custom__init__()
method as well.By default, experiments use TensorFlow 1.x. To configure your trial to use TensorFlow 2.x, specify a TensorFlow 2.x image in the environment.image field of the experiment configuration (e.g.,
determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.2-gpu-0.7.0
).Trials default to using eager execution with TensorFlow 2.x but not with TensorFlow 1.x. To override the default behavior, call the appropriate function in your
__init__
method. For example, if you want to disable eager execution while using TensorFlow 2.x, calltf.compat.v1.disable_eager_execution
at the top of your__init__
method.For more information on writing
tf.keras
trial classes, refer to the tutorial.-
__init__
(context: determined.keras._tf_keras_context.TFKerasTrialContext) → None¶ Initializes a trial using the provided
context
.This method should typically be overridden by trial definitions: at minimum, it is important to store
context
as an instance variable so that it can be accessed by other methods of the trial class. This can also be a convenient place to initialize other state that is shared between methods.
-
abstract
build_model
() → tensorflow.python.keras.engine.training.Model¶ Returns the deep learning architecture associated with a trial. The architecture might depend on the current values of the model’s hyperparameters, which can be accessed via
context.get_hparam()
. This function returns atf.keras.Model
object.After constructing the
tf.keras.Model
object, users must do two things before returning it:Wrap the model using
context.wrap_model()
.Compile the model using
model.compile()
.
-
abstract
build_training_data_loader
() → Union[tensorflow.python.keras.utils.data_utils.Sequence, tensorflow.python.data.ops.dataset_ops.DatasetV1, determined.keras._data.SequenceAdapter, tuple]¶ Defines the data loader to use during training.
- Should return one of the following:
1) A tuple
(x_train, y_train)
, wherex_train
is a NumPy array (or array-like), a list of arrays (in case the model has multiple inputs), or a dict mapping input names to the corresponding array, if the model has named inputs.y_train
should be a NumPy array.2) A tuple
(x_train, y_train, sample_weights)
of NumPy arrays.3) A tf.data.Dataset returning a tuple of either
(inputs, targets)
or(inputs, targets, sample_weights)
.4) A keras.utils.Sequence returning a tuple of either
(inputs, targets)
or(inputs, targets, sample weights)
.5) A
determined.keras.SequenceAdapter
returning a tuple of either(inputs, targets)
or(inputs, targets, sample weights)
.
When using
tf.data.Dataset
, you must wrap the dataset usingdetermined.keras.TFKerasTrialContext.wrap_dataset()
. This wrapper is used to shard the dataset for distributed training. For optimal performance, users should wrap a dataset immediately after creating it.Warning
If you are using
tf.data.Dataset
, Determined’s support for automatically checkpointing the dataset does not currently work correctly. This means that resuming workloads will start from the beginning of the dataset if usingtf.data.Dataset
.
-
abstract
build_validation_data_loader
() → Union[tensorflow.python.keras.utils.data_utils.Sequence, tensorflow.python.data.ops.dataset_ops.DatasetV1, determined.keras._data.SequenceAdapter, tuple]¶ Defines the data loader to use during validation.
- Should return one of the following:
1) A tuple
(x_val, y_val)
, wherex_val
is a NumPy array (or array-like), a list of arrays (in case the model has multiple inputs), or a dict mapping input names to the corresponding array, if the model has named inputs.y_val
should be a NumPy array.2) A tuple
(x_val, y_val, sample_weights)
of NumPy arrays.3) A tf.data.Dataset returning a tuple of either
(inputs, targets)
or(inputs, targets, sample_weights)
.4) A keras.utils.Sequence returning a tuple of either
(inputs, targets)
or(inputs, targets, sample weights)
.5) A
determined.keras.SequenceAdapter
returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).
When using
tf.data.Dataset
, you must wrap the dataset usingdetermined.keras.TFKerasTrialContext.wrap_dataset()
. This wrapper is used to shard the dataset for distributed training. For optimal performance, users should wrap a dataset immediately after creating it.
-
session_config
() → tensorflow.core.protobuf.config_pb2.ConfigProto¶ Specifies the tf.ConfigProto to be used by the TensorFlow session. By default,
tf.ConfigProto(allow_soft_placement=True)
is used.
-
keras_callbacks
() → List[tensorflow.python.keras.callbacks.Callback]¶ Specifies a list of tf.keras.callback.Callback objects to be used during the trial’s lifetime.
Callbacks should avoid calling
model.predict()
, as this will affect Determined training behavior.Note
If you specify a Keras callback that uses the on_epoch_begin or <on_epoch_end interfaces, epoch boundaries are determined by the length of the training data set, not by the value of the Determined configuration setting records_per_epoch.
-
Data Loading¶
There are five supported data types for loading data into tf.keras
models:
A tuple
(x, y)
of Numpy arrays. x must be a NumPy array (or array-like), a list of arrays (in case the model has multiple inputs), or a dict mapping input names to the corresponding array, if the model has named inputs. y should be a numpy array.A tuple
(x, y, sample_weights)
of Numpy arrays.A
tf.data.dataset
returning a tuple of either (inputs, targets) or (inputs, targets, sample_weights).A
keras.utils.Sequence
returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).A
determined.keras.SequenceAdapter
returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).
Loading data is done by defining
build_training_data_loader()
and
build_validation_data_loader()
methods. Each should return one of the supported data types mentioned
above.
Optimizing Keras Sequences¶
To optimize performance of tf.keras.Sequence
which are created from
generators, Determined provides determined.keras.SequenceAdapter
.
-
class
determined.keras.
SequenceAdapter
(data: tensorflow.python.keras.utils.data_utils.Sequence, use_multiprocessing: bool = False, workers: int = 1, max_queue_size: int = 10)¶ A class to assist to optimize the performance of loading data with
tf.keras.utils.Sequence
and help with restoring and saving iterators for a dataset.-
__init__
(data: tensorflow.python.keras.utils.data_utils.Sequence, use_multiprocessing: bool = False, workers: int = 1, max_queue_size: int = 10)¶ Multiprocessing or multithreading for native Python generators is not supported. If you want these performance accelerations, please consider using a Sequence.
- Parameters
sequence – A
tf.keras.utils.Sequence
that holds the data.use_multiprocessing – If True, use process-based threading. If unspecified, use_multiprocessing will default to False. Note that because this implementation relies on multiprocessing, you should not pass non-picklable arguments for the data loaders as they can’t be passed easily to children processes.
workers – Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1. If 0, will execute the data loading on the main thread.
max_queue_size – Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
-
Usage Examples
Use main Python process with no multithreading and no multiprocessing
SequenceAdapter(sequence, workers=0, use_multiprocessing=False)
Use one background process
SequenceAdapter(sequence, workers=1, use_multiprocessing=True)
Use two background threads
SequenceAdapter(sequence, workers=2, use_multiprocessing=False)
Required Wrappers¶
Users are required wrap their model prior to compiling it using
self.context.wrap_model
. This is typically
done inside build_model()
.
If using tf.data.Dataset
, users are required to wrap both their
training and validation dataset using self.context.wrap_dataset
. This wrapper is
used to shard the dataset for Distributed Training. For optimal
performance, users should wrap a dataset immediately after creating it.
Trial Context¶
determined.keras.TFKerasTrialContext
is a sub-class of
determined.TrialContext
that provides useful methods for
writing tf.keras
trial definitions, as well as functions to wrap the
model and dataset.
-
class
determined.keras.
TFKerasTrialContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ TFKerasTrialContext always has a
DistributedContext
accessible viacontext.distributed
for information related to distributed training.TFKerasTrialContext always has a
TFKerasExperimentalContext
accessible viacontext.experimental
for information related to experimental features.-
wrap_model
(model: Any) → Any¶ This should be used to wrap
tf.keras.Model
objects immediately after they have been created but before they have been compiled. This function takes atf.keras.Model
and returns a wrapped version of the model; the return value should be used in place of the original model.- Parameters
model – tf.keras.Model
-
wrap_dataset
(dataset: Any, shard_dataset: bool = True) → Any¶ This should be used to wrap
tf.data.Dataset
objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for validation), users should wrap each dataset independently.- Parameters
dataset – tf.data.Dataset
shard_dataset – When performing multi-slot (distributed) training, this controls whether the dataset is sharded so that each training process (one per slot) sees unique data. If set to False, users must manually configure each process to use unique data.
-
-
class
determined.keras.
TFKerasExperimentalContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ Context class that contains experimental runtime information and features for any Determined workflow that uses the
tf.keras
API.TFKerasExperimentalContext
extendsEstimatorTrialContext
under thecontext.experimental
namespace.-
cache_train_dataset
(dataset_id: str, dataset_version: str, shuffle: bool = False, skip_shuffle_at_epoch_end: bool = False) → Callable¶ cache_train_dataset is a decorator for creating your training dataset. It should decorate a function that outputs a
tf.data.Dataset
object. The dataset will be stored in a cache, keyed bydataset_id
anddataset_version
. The cache is re-used in subsequent calls.- Parameters
dataset_id – A string that will be used as part of the unique identifier for this dataset.
dataset_version – A string that will be used as part of the unique identifier for this dataset.
shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.
skip_shuffle_at_epoch_end – A bool indicating if shuffling should be skipped at the end of epochs.
Example Usage:
def make_train_dataset(self): @self.context.experimental.cache_train_dataset("range_dataset", "v1") def make_dataset(): ds = tf.data.Dataset.range(10) return ds dataset = make_dataset() dataset = dataset.batch(self.context.get_per_slot_batch_size()) dataset = dataset.map(...) return dataset
Note
dataset.batch()
and runtime augmentation should be done after caching. Additionally, users should never need to calldataset.repeat()
.
-
cache_validation_dataset
(dataset_id: str, dataset_version: str, shuffle: bool = False) → Callable¶ cache_validation_dataset is a decorator for creating your validation dataset. It should decorate a function that outputs a
tf.data.Dataset
object. The dataset will be stored in a cache, keyed bydataset_id
anddataset_version
. The cache is re-used in subsequent calls.- Parameters
dataset_id – A string that will be used as part of the unique identifier for this dataset.
dataset_version – A string that will be used as part of the unique identifier for this dataset.
shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.
-
Callbacks¶
To execute arbitrary Python code during the lifecycle of a
TFKerasTrial
, implement the standard Keras callback interface
tf.keras.callbacks.Callbacks
and supply them to the TFKerasTrial
by implementing keras_callbacks()
.
-
determined.keras.TFKerasTrial.
keras_callbacks
(self) → List[tensorflow.python.keras.callbacks.Callback] Specifies a list of tf.keras.callback.Callback objects to be used during the trial’s lifetime.
Callbacks should avoid calling
model.predict()
, as this will affect Determined training behavior.Note
If you specify a Keras callback that uses the on_epoch_begin or <on_epoch_end interfaces, epoch boundaries are determined by the length of the training data set, not by the value of the Determined configuration setting records_per_epoch.