determined.estimator¶
determined.estimator.EstimatorTrial
¶
-
class
determined.estimator.
EstimatorTrial
(context: determined.estimator._estimator_context.EstimatorTrialContext)¶ By default, experiments run with TensorFlow 1.x. To configure your trial to use TensorFlow 2.x, set a TF 2.x image in the experiment configuration (e.g.
determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.1-gpu-0.3.0
).EstimatorTrial
supports TF 2.x; however it uses TensorFlow V1 behavior. We have disabled TensorFlow V2 behavior forEstimatorTrial
, so there is no need for you to disable it.-
trial_context_class
¶ alias of
determined.estimator._estimator_context.EstimatorTrialContext
-
__init__
(context: determined.estimator._estimator_context.EstimatorTrialContext)¶ Initializes a trial using the provided trial_context.
Override this function to initialize any shared state between the estimator, train spec, and/or validation spec.
-
abstract
build_estimator
() → tensorflow_estimator.python.estimator.estimator.Estimator¶ Specifies the tf.estimator.Estimator instance to be used during training and validation. This may be an instance of a Premade Estimator provided by the TensorFlow team, or a Custom Estimator created by the user.
-
abstract
build_train_spec
() → tensorflow_estimator.python.estimator.training.TrainSpec¶ Specifies the tf.estimator.TrainSpec to be used for training steps. This training specification will contain a TensorFlow
input_fn
which constructs the input data for a training step. Unlike the standard Tensorflowinput_fn
interface,EstimatorTrial
only supports aninput_fn
that returns atf.data.Dataset
object. A function that returns a tuple of features and labels is currently not supported byEstimatorTrial
. Additionally, themax_steps
attribute of the training specification will be ignored; instead, thebatches_per_step
option in the experiment configuration is used to determine how many batches each training step uses.
-
abstract
build_validation_spec
() → tensorflow_estimator.python.estimator.training.EvalSpec¶ Specifies the tf.estimator.EvalSpec to be used for validation steps. This evaluation spec will contain a TensorFlow
input_fn
which constructs the input data for a validation step. The validation step will evaluatesteps
batches, or evaluate until theinput_fn
raises an end-of-input exception ifsteps
isNone
.
-
build_serving_input_receiver_fns
() → Dict[str, Callable[..., Union[tensorflow_estimator.python.estimator.export.export.ServingInputReceiver, tensorflow_estimator.python.estimator.export.export.TensorServingInputReceiver]]]¶ Optionally returns a Python dictionary mapping string names to serving_input_receiver_fn s. If specified, each serving input receiver function will be used to export a distinct SavedModel inference graph when a Determined checkpoint is saved, using Estimator.export_saved_model. The exported models are saved under subdirectories named by the keys of the respective serving input receiver functions. For example, returning
{ "raw": tf.estimator.export.build_raw_serving_input_receiver_fn(...), "parsing": tf.estimator.export.build_parsing_serving_input_receiver_fn(...) }
from this function would configure Determined to export two
SavedModel
inference graphs in every checkpoint underraw
andparsing
subdirectories, respectively. By default, this function returns an empty dictionary and the Determined checkpoint directory only contains metadata associated with the training graph.
-
determined.experimental.estimator.init()
¶
-
determined.experimental.estimator.
init
(config: Optional[Dict[str, Any]] = None, mode: determined.experimental._native.Mode = <Mode.CLUSTER: 'cluster'>, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → determined.estimator._estimator_context.EstimatorNativeContext¶ Create a tf.estimator experiment using the Native API.
- Parameters
config – A dictionary representing the experiment configuration to be associated with the experiment.
mode –
The
determined.experimental.Mode
used when creating an experiment1.
Mode.CLUSTER
(default): Submit the experiment to a remote Determined cluster.2.
Mode.LOCAL
: Test the experiment in the calling Python process for development / debugging purposes. Run through a minimal loop of training, validation, and checkpointing steps.context_dir –
A string filepath that defines the context directory. All model code will be executed with this as the current working directory.
In CLUSTER mode, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.
In LOCAL mode, this argument is optional and assumed to be the current working directory by default.
command – A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a python script, this argument is inferred to be
sys.argv
by default. When executing this function via IPython or Jupyter notebook, this argument is required.master_url – An optional string to use as the Determined master URL in submit mode. Will default to the value of environment variable
DET_MASTER
if not provided.
- Returns
determined.estimator.EstimatorContext
¶
To use tf.estimator
models with Determined, users need to wrap their
optimizer and datasets using the following functions inherited from
determined.estimator.EstimatorContext
. Note that the concrete context
object where these functions will be found will be either
determined.estimator.EstimatorTrialContext
or
determined.estimator.EstimatorNativeContext
, depending on use of Trial
API or Native API.
-
class
determined.estimator.
EstimatorContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ Base context class that contains runtime information for any Determined workflow that uses the
tf.estimator
API.EstimatorTrialContext always has a
DistributedContext
accessible viacontext.distributed
for information related to distributed training.EstimatorTrialContext always has a
EstimatorExperimentalContext
accessible viacontext.experimental
for information related to experimental features.-
wrap_optimizer
(optimizer: Any) → Any¶ This should be used to wrap optimizer objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their optimizer. For example, if users create their optimizer within
build_estimator()
, they should calloptimizer = wrap_optimizer(optimzer)
prior to passing the optimizer into their Estimator.
-
wrap_dataset
(dataset: Any) → Any¶ This should be used to wrap
tf.data.Dataset
objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for testing), users should wrap each dataset independently. E.g., If users instantiate their training dataset withinbuild_train_spec()
, they should calldataset = wrap_dataset(dataset)
prior to passing it intotf.estimator.TrainSpec
.
-
-
class
determined.estimator.
EstimatorExperimentalContext
(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ Context class that contains experimental runtime information and features for any Determined workflow that uses the
tf.estimator
API.EstimatorExperimentalContext
extendsEstimatorTrialContext
under thecontext.experimental
namespace.-
cache_train_dataset
(dataset_id: str, dataset_version: str, shuffle: bool = False, skip_shuffle_at_epoch_end: bool = False) → Callable¶ cache_train_dataset is a decorator for creating your training dataset. It should decorate a function that outputs a
tf.data.Dataset
object. The dataset will be stored in a cache, keyed bydataset_id
anddataset_version
. The cache is re-used in subsequent calls.- Parameters
dataset_id – A string that will be used as part of the unique identifier for this dataset.
dataset_version – A string that will be used as part of the unique identifier for this dataset.
shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.
skip_shuffle_at_epoch_end – A bool indicating if shuffling should be skipped at the end of epochs.
Example Usage:
def make_train_dataset(self): @self.context.experimental.cache_train_dataset("range_dataset", "v1") def make_dataset(): ds = tf.data.Dataset.range(10) return ds dataset = make_dataset() dataset = dataset.batch(self.context.get_per_slot_batch_size()) dataset = dataset.map(...) return dataset
Note
dataset.batch()
and runtime augmentation should be done after caching. Additionally, users should never need to calldataset.repeat()
.
-
cache_validation_dataset
(dataset_id: str, dataset_version: str, shuffle: bool = False) → Callable¶ cache_validation_dataset is a decorator for creating your validation dataset. It should decorate a function that outputs a
tf.data.Dataset
object. The dataset will be stored in a cache, keyed bydataset_id
anddataset_version
. The cache is re-used in subsequent calls.- Parameters
dataset_id – A string that will be used as part of the unique identifier for this dataset.
dataset_version – A string that will be used as part of the unique identifier for this dataset.
shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.
-