determined.NativeContext¶
The NativeContext
provides useful methods for writing tf.keras
and
tf.estimator
experiments using the Native API. Every init()
function
supported by the Native API returns a subclass of NativeContext
:
determined.keras.init()
returnsdetermined.keras.TFKerasNativeContext
.determined.estimator.init()
returnsdetermined.estimator.EstimatorNativeContext
.
determined.NativeContext
¶
-
class
determined.
NativeContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ A base class that all NativeContexts will inherit when using the Native API.
The context returned by the init() function must inherit from this class.
NativeContext always has a
DistributedContext
accessible viacontext.distributed
for information related to distributed training.-
get_data_config
() → Dict[str, Any]¶ Return the data configuration.
-
get_experiment_config
() → Dict[str, Any]¶ Return the experiment configuration.
-
get_experiment_id
() → int¶ Return the experiment ID of the current trial.
-
get_global_batch_size
() → int¶ Return the global batch size.
-
get_hparam
(name: str) → Any¶ Return the current value of the hyperparameter with the given name.
-
get_hparams
() → Dict[str, Any]¶ Return a dictionary of hyperparameter names to values.
-
get_per_slot_batch_size
() → int¶ Return the per-slot batch size. When a model is trained with a single GPU, this is equal to the global batch size. When multi-GPU training is used, this is equal to the global batch size divided by the number of GPUs used to train the model.
-
get_stop_requested
() → bool¶ Return whether a trial stoppage has been requested.
-
get_trial_id
() → int¶ Return the trial ID of the current trial.
-
set_stop_requested
(stop_requested: bool) → None¶ Set a flag to request a trial stoppage. When this flag is set to True, we finish the step, checkpoint, then exit.
-
determined.TrialContext.distributed
¶
-
class
determined._train_context.
DistributedContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext) DistributedContext extends all TrialContexts and NativeContexts under the
context.distributed
namespace. It provides useful methods for effective distributed training.-
get_rank
() → int Return the rank of the process in the trial. The rank of a process is a unique ID within the trial; that is, no two processes in the same trial will be assigned the same rank.
-
get_local_rank
() → int Return the rank of the process on the agent. The local rank of a process is a unique ID within a given agent and trial; that is, no two processes in the same trial that are executing on the same agent will be assigned the same rank.
-
get_size
() → int Return the number of slots this trial is running on.
-
get_num_agents
() → int Return the number of agents this trial is running on.
-
determined.keras.TFKerasNativeContext
¶
-
class
determined.keras.
TFKerasNativeContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ TFKerasNativeContext always has a
DistributedContext
accessible viacontext.distributed
for information related to distributed training.-
get_data_config
() → Dict[str, Any]¶ Return the data configuration.
-
get_experiment_config
() → Dict[str, Any]¶ Return the experiment configuration.
-
get_experiment_id
() → int¶ Return the experiment ID of the current trial.
-
get_global_batch_size
() → int¶ Return the global batch size.
-
get_hparam
(name: str) → Any¶ Return the current value of the hyperparameter with the given name.
-
get_hparams
() → Dict[str, Any]¶ Return a dictionary of hyperparameter names to values.
-
get_per_slot_batch_size
() → int¶ Return the per-slot batch size. When a model is trained with a single GPU, this is equal to the global batch size. When multi-GPU training is used, this is equal to the global batch size divided by the number of GPUs used to train the model.
-
get_stop_requested
() → bool¶ Return whether a trial stoppage has been requested.
-
get_trial_id
() → int¶ Return the trial ID of the current trial.
-
set_stop_requested
(stop_requested: bool) → None¶ Set a flag to request a trial stoppage. When this flag is set to True, we finish the step, checkpoint, then exit.
-
wrap_dataset
(dataset: Any) → Any¶ This should be used to wrap
tf.data.Dataset
objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for testing), users should wrap each dataset independently.- Parameters
dataset – tf.data.Dataset
-
determined.estimator.EstimatorNativeContext
¶
-
class
determined.estimator.
EstimatorNativeContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ EstimatorNativeContext always has a
DistributedContext
accessible viacontext.distributed
for information related to distributed training.-
get_data_config
() → Dict[str, Any]¶ Return the data configuration.
-
get_experiment_config
() → Dict[str, Any]¶ Return the experiment configuration.
-
get_experiment_id
() → int¶ Return the experiment ID of the current trial.
-
get_global_batch_size
() → int¶ Return the global batch size.
-
get_hparam
(name: str) → Any¶ Return the current value of the hyperparameter with the given name.
-
get_hparams
() → Dict[str, Any]¶ Return a dictionary of hyperparameter names to values.
-
get_per_slot_batch_size
() → int¶ Return the per-slot batch size. When a model is trained with a single GPU, this is equal to the global batch size. When multi-GPU training is used, this is equal to the global batch size divided by the number of GPUs used to train the model.
-
get_stop_requested
() → bool¶ Return whether a trial stoppage has been requested.
-
get_trial_id
() → int¶ Return the trial ID of the current trial.
-
set_stop_requested
(stop_requested: bool) → None¶ Set a flag to request a trial stoppage. When this flag is set to True, we finish the step, checkpoint, then exit.
-
wrap_dataset
(dataset: Any) → Any¶ This should be used to wrap
tf.data.Dataset
objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for testing), users should wrap each dataset independently. E.g., If users instantiate their training dataset withinbuild_train_spec()
, they should calldataset = wrap_dataset(dataset)
prior to passing it intotf.estimator.TrainSpec
.
-
wrap_optimizer
(optimizer: Any) → Any¶ This should be used to wrap optimizer objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their optimizer. For example, if users create their optimizer within
build_estimator()
, they should calloptimizer = wrap_optimizer(optimzer)
prior to passing the optimizer into their Estimator.
-