determined.keras¶
determined.keras.TFKerasTrial
¶
-
class
determined.keras.
TFKerasTrial
(trial_context: determined.keras._tf_keras_context.TFKerasTrialContext)¶ tf.keras
trials are created by subclassing the abstract classTFKerasTrial
.Users must define all the abstract methods to create the deep learning model associated with a specific trial, and to subsequently train and evaluate it.
By default, experiments run with TensorFlow 1.x. To configure your trial to use TensorFlow 2.x, set a TF 2.x image in the experiment configuration (e.g.
determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.1-gpu-0.3.0
).By default, trials using TF 2.x use execute eagerly, and trials using TF 1.x do not execute eagerly. If you want to override the default, you must call the appropriate function in the
__init__
. For example, if you wanted to disable eager execution while running a TF 2.x trial, you would calltf.compat.v1.disable_eager_execution
at the top of your__init__
.-
trial_context_class
¶ alias of
determined.keras._tf_keras_context.TFKerasTrialContext
-
__init__
(trial_context: determined.keras._tf_keras_context.TFKerasTrialContext) → None¶ Initializes a trial using the provided trial_context.
Override this function to initialize any shared state between the estimator, train spec, and/or validation spec.
-
abstract
build_model
() → tensorflow.python.keras.engine.training.Model¶ Defines the deep learning architecture associated with a trial, which may depend on the trial’s specific hyperparameter settings that are stored in the
hparams
dictionary. This function returns atf.keras.Model
object. Users must compile this model by callingmodel.compile()
on thetf.keras.Model
instance before it is returned.
-
abstract
build_training_data_loader
() → Union[tensorflow.python.keras.utils.data_utils.Sequence, tensorflow.python.data.ops.dataset_ops.DatasetV1, determined.keras._data.SequenceAdapter, tuple]¶ Defines the data loader to use during training.
- Should return one of the following:
1) A tuple (x_train, y_train) of Numpy arrays. x_train must be a Numpy array (or array-like), a list of arrays (in case the model has multiple inputs), or a dict mapping input names to the corresponding array, if the model has named inputs. y_train should be a numpy array.
2) A tuple (x_train, y_train, sample_weights) of Numpy arrays.
3) A tf.data.Dataset returning a tuple of either (inputs, targets) or (inputs, targets, sample_weights).
4) A keras.utils.Sequence returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).
5) A det.keras.SequenceAdapter returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).
Warning
If you are using
tf.data.Dataset
, Determined’s support for automatically checkpointing the dataset does not currently work correctly. This means that resuming workloads will start from the beginning of the dataset if usingtf.data.Dataset
.
-
abstract
build_validation_data_loader
() → Union[tensorflow.python.keras.utils.data_utils.Sequence, tensorflow.python.data.ops.dataset_ops.DatasetV1, determined.keras._data.SequenceAdapter, tuple]¶ Defines the data loader to use during validation.
- Should return one of the following:
1) A tuple (x_val, y_val) of Numpy arrays. x_val must be a Numpy array (or array-like), a list of arrays (in case the model has multiple inputs), or a dict mapping input names to the corresponding array, if the model has named inputs. y_train should be a numpy array.
2) A tuple (x_val, y_val, sample_weights) of Numpy arrays.
3) A tf.data.Dataset returning a tuple of either (inputs, targets) or (inputs, targets, sample_weights).
4) A keras.utils.Sequence returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).
5) A det.keras.SequenceAdapter returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).
-
session_config
() → tensorflow.core.protobuf.config_pb2.ConfigProto¶ Specifies the tf.ConfigProto to be used by the TensorFlow session. By default,
tf.ConfigProto(allow_soft_placement=True)
is used.
-
keras_callbacks
() → List[tensorflow.python.keras.callbacks.Callback]¶ Specifies a list of tf.keras.callback.Callback objects to be used during the trial’s lifetime.
Callback should avoid calling model.predict() or change model.stop_training as this will affect Determined training behavior.
-
Data Loading¶
There are five supported data types for loading data into tf.keras
models:
A tuple
(x, y)
of Numpy arrays. x must be a Numpy array (or array-like), a list of arrays (in case the model has multiple inputs), or a dict mapping input names to the corresponding array, if the model has named inputs. y should be a numpy array.A tuple
(x, y, sample_weights)
of Numpy arrays.A
tf.data.dataset
returning a tuple of either (inputs, targets) or (inputs, targets, sample_weights).A
keras.utils.Sequence
returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).A
det.keras.SequenceAdapter
returning a tuple of either (inputs, targets) or (inputs, targets, sample weights).
Loading data is done by defining build_training_data_loader
and
build_validation_data_loader
functions. Each should return one of the
supported data types mentioned above.
Optimizing Keras Sequences¶
To optimize performance of tf.keras.Sequence
which are created from generators,
Determined provides determined.keras.SequenceAdapter
.
-
class
determined.keras.
SequenceAdapter
(data: tensorflow.python.keras.utils.data_utils.Sequence, use_multiprocessing: bool = False, workers: int = 1, max_queue_size: int = 10)¶ A class to assist to optimize performance of tf.keras.sequence and help with restoring and saving iterators for a dataset.
-
__init__
(data: tensorflow.python.keras.utils.data_utils.Sequence, use_multiprocessing: bool = False, workers: int = 1, max_queue_size: int = 10)¶ Multiprocessing or multithreading for native Python generators is not supported. If you want these performance accelerations, please consider using a Sequence.
- Parameters
sequence – A tf.keras.utils.Sequence that holds the data.
use_multiprocessing – If True, use process-based threading. If unspecified, use_multiprocessing will default to False. Note that because this implementation relies on multiprocessing, you should not pass non-picklable arguments for the data loaders as they can’t be passed easily to children processes.
workers – Maximum number of processes to spin up when using process-based threading. If unspecified, workers will default to 1. If 0, will execute the data loading on the main thread.
max_queue_size – Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.
-
Usage Examples
Use main Python process with no multithreading and no multiprocessing
SequenceAdapter(sequence, workers=0, use_multiprocessing=False)
Use one background process
SequenceAdapter(sequence, workers=1, use_multiprocessing=True)
Use two background threads
SequenceAdapter(sequence, workers=2, use_multiprocessing=False)
Required Wrappers¶
Users are required wrap their model prior to compiling it using the
self.context.wrap_model
.
This is typically done inside determined.keras.TFKerasTrial.build_model()
.
-
determined.keras.TFKerasTrialContext.
wrap_model
(self, model: Any) → Any This should be used to wrap
tf.keras.Model
objects immediately after they have been created but before they have been compiled. This function takes atf.keras.Model
and returns a wrapped version of the model; the return value should be used in place of the original model.- Parameters
model – tf.keras.Model
If using tf.data.Dataset
, users are required to wrap both their training and
validation dataset in a Determined-provided wrapper. This wrapper is used to shard
the dataset for Distributed and Parallel Training. For optimal performance, users should
wrap dataset immediately after creating it.
-
determined.keras.TFKerasContext.
wrap_dataset
(self, dataset: Any) → Any¶ This should be used to wrap
tf.data.Dataset
objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for testing), users should wrap each dataset independently.- Parameters
dataset – tf.data.Dataset
Trial Context¶
determined.keras.TFKerasTrialContext
subclasses determined.TrialContext.
It provides useful methods for writing Trial
subclasses. It also provides
the model and dataset wrappers.
-
class
determined.keras.
TFKerasTrialContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext) TFKerasTrialContext always has a
DistributedContext
accessible viacontext.distributed
for information related to distributed training.TFKerasTrialContext always has a
TFKerasExperimentalContext
accessible viacontext.experimental
for information related to experimental features.-
wrap_model
(model: Any) → Any This should be used to wrap
tf.keras.Model
objects immediately after they have been created but before they have been compiled. This function takes atf.keras.Model
and returns a wrapped version of the model; the return value should be used in place of the original model.- Parameters
model – tf.keras.Model
-
wrap_dataset
(dataset: Any) → Any This should be used to wrap
tf.data.Dataset
objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for testing), users should wrap each dataset independently.- Parameters
dataset – tf.data.Dataset
-
-
class
determined.keras.
TFKerasExperimentalContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ Context class that contains experimental runtime information and features for any Determined workflow that uses the
tf.keras
API.TFKerasExperimentalContext
extendsEstimatorTrialContext
under thecontext.experimental
namespace.-
cache_train_dataset
(dataset_id: str, dataset_version: str, shuffle: bool = False, skip_shuffle_at_epoch_end: bool = False) → Callable¶ cache_train_dataset is a decorator for creating your training dataset. It should decorate a function that outputs a
tf.data.Dataset
object. The dataset will be stored in a cache, keyed bydataset_id
anddataset_version
. The cache is re-used in subsequent calls.- Parameters
dataset_id – A string that will be used as part of the unique identifier for this dataset.
dataset_version – A string that will be used as part of the unique identifier for this dataset.
shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.
skip_shuffle_at_epoch_end – A bool indicating if shuffling should be skipped at the end of epochs.
Example Usage:
def make_train_dataset(self): @self.context.experimental.cache_train_dataset("range_dataset", "v1") def make_dataset(): ds = tf.data.Dataset.range(10) return ds dataset = make_dataset() dataset = dataset.batch(self.context.get_per_slot_batch_size()) dataset = dataset.map(...) return dataset
Note
dataset.batch()
and runtime augmentation should be done after caching. Additionally, users should never need to calldataset.repeat()
.
-
cache_validation_dataset
(dataset_id: str, dataset_version: str, shuffle: bool = False) → Callable¶ cache_validation_dataset is a decorator for creating your validation dataset. It should decorate a function that outputs a
tf.data.Dataset
object. The dataset will be stored in a cache, keyed bydataset_id
anddataset_version
. The cache is re-used in subsequent calls.- Parameters
dataset_id – A string that will be used as part of the unique identifier for this dataset.
dataset_version – A string that will be used as part of the unique identifier for this dataset.
shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.
-
Native¶
Disregard if using the trial API (subclassing determined.keras.TFKerasTrial
).
determined.experimental.keras.init()
¶
-
determined.experimental.keras.
init
(config: Optional[Dict[str, Any]] = None, mode: determined.experimental._native.Mode = <Mode.CLUSTER: 'cluster'>, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → determined.keras._tf_keras_context.TFKerasNativeContext¶ Create a tf.keras experiment using the Native API.
- Parameters
config – A dictionary representing the experiment configuration to be associated with the experiment.
mode –
The
determined.experimental.Mode
used when creating an experiment1.
Mode.CLUSTER
(default): Submit the experiment to a remote Determined cluster.2.
Mode.LOCAL
: Test the experiment in the calling Python process for development / debugging purposes. Run through a minimal loop of training, validation, and checkpointing steps.context_dir –
A string filepath that defines the context directory. All model code will be executed with this as the current working directory.
In CLUSTER mode, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.
In LOCAL mode, this argument is optional and assumed to be the current working directory by default.
command – A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a python script, this argument is inferred to be
sys.argv
by default. When executing this function via IPython or Jupyter notebook, this argument is required.master_url – An optional string to use as the Determined master URL in submit mode. Will default to the value of environment variable
DET_MASTER
if not provided.
- Returns