determined.estimator¶
determined.estimator.EstimatorTrial¶
-
class
determined.estimator.EstimatorTrial(context: determined.estimator._estimator_context.EstimatorTrialContext)¶ By default, experiments run with TensorFlow 1.x. To configure your trial to use TensorFlow 2.x, set a TF 2.x image in the experiment configuration (e.g.
determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.1-gpu-0.3.0).EstimatorTrialsupports TF 2.x; however it uses TensorFlow V1 behavior. We have disabled TensorFlow V2 behavior forEstimatorTrial, so there is no need for you to disable it.-
trial_context_class¶ alias of
determined.estimator._estimator_context.EstimatorTrialContext
-
__init__(context: determined.estimator._estimator_context.EstimatorTrialContext)¶ Initializes a trial using the provided trial_context.
Override this function to initialize any shared state between the estimator, train spec, and/or validation spec.
-
abstract
build_estimator() → tensorflow_estimator.python.estimator.estimator.Estimator¶ Specifies the tf.estimator.Estimator instance to be used during training and validation. This may be an instance of a Premade Estimator provided by the TensorFlow team, or a Custom Estimator created by the user.
-
abstract
build_train_spec() → tensorflow_estimator.python.estimator.training.TrainSpec¶ Specifies the tf.estimator.TrainSpec to be used for training steps. This training specification will contain a TensorFlow
input_fnwhich constructs the input data for a training step. Unlike the standard Tensorflowinput_fninterface,EstimatorTrialonly supports aninput_fnthat returns atf.data.Datasetobject. A function that returns a tuple of features and labels is currently not supported byEstimatorTrial. Additionally, themax_stepsattribute of the training specification will be ignored; instead, thebatches_per_stepoption in the experiment configuration is used to determine how many batches each training step uses.
-
abstract
build_validation_spec() → tensorflow_estimator.python.estimator.training.EvalSpec¶ Specifies the tf.estimator.EvalSpec to be used for validation steps. This evaluation spec will contain a TensorFlow
input_fnwhich constructs the input data for a validation step. The validation step will evaluatestepsbatches, or evaluate until theinput_fnraises an end-of-input exception ifstepsisNone.
-
build_serving_input_receiver_fns() → Dict[str, Callable[..., Union[tensorflow_estimator.python.estimator.export.export.ServingInputReceiver, tensorflow_estimator.python.estimator.export.export.TensorServingInputReceiver]]]¶ Optionally returns a Python dictionary mapping string names to serving_input_receiver_fn s. If specified, each serving input receiver function will be used to export a distinct SavedModel inference graph when a Determined checkpoint is saved, using Estimator.export_saved_model. The exported models are saved under subdirectories named by the keys of the respective serving input receiver functions. For example, returning
{ "raw": tf.estimator.export.build_raw_serving_input_receiver_fn(...), "parsing": tf.estimator.export.build_parsing_serving_input_receiver_fn(...) }
from this function would configure Determined to export two
SavedModelinference graphs in every checkpoint underrawandparsingsubdirectories, respectively. By default, this function returns an empty dictionary and the Determined checkpoint directory only contains metadata associated with the training graph.
-
determined.experimental.estimator.init()¶
-
determined.experimental.estimator.init(config: Optional[Dict[str, Any]] = None, mode: determined.experimental._native.Mode = <Mode.CLUSTER: 'cluster'>, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → determined.estimator._estimator_context.EstimatorNativeContext¶ Create a tf.estimator experiment using the Native API.
- Parameters
config – A dictionary representing the experiment configuration to be associated with the experiment.
mode –
The
determined.experimental.Modeused when creating an experiment1.
Mode.CLUSTER(default): Submit the experiment to a remote Determined cluster.2.
Mode.LOCAL: Test the experiment in the calling Python process for development / debugging purposes. Run through a minimal loop of training, validation, and checkpointing steps.context_dir –
A string filepath that defines the context directory. All model code will be executed with this as the current working directory.
In CLUSTER mode, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.
In LOCAL mode, this argument is optional and assumed to be the current working directory by default.
command – A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a python script, this argument is inferred to be
sys.argvby default. When executing this function via IPython or Jupyter notebook, this argument is required.master_url – An optional string to use as the Determined master URL in submit mode. Will default to the value of environment variable
DET_MASTERif not provided.
- Returns
determined.estimator.EstimatorContext¶
To use tf.estimator models with Determined, users need to wrap their
optimizer and datasets using the following functions inherited from
determined.estimator.EstimatorContext. Note that the concrete context
object where these functions will be found will be either
determined.estimator.EstimatorTrialContext or
determined.estimator.EstimatorNativeContext, depending on use of Trial
API or Native API.
-
class
determined.estimator.EstimatorContext(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ Base context class that contains runtime information for any Determined workflow that uses the
tf.estimatorAPI.EstimatorTrialContext always has a
DistributedContextaccessible viacontext.distributedfor information related to distributed training.EstimatorTrialContext always has a
EstimatorExperimentalContextaccessible viacontext.experimentalfor information related to experimental features.-
wrap_optimizer(optimizer: Any) → Any¶ This should be used to wrap optimizer objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their optimizer. For example, if users create their optimizer within
build_estimator(), they should calloptimizer = wrap_optimizer(optimzer)prior to passing the optimizer into their Estimator.
-
wrap_dataset(dataset: Any) → Any¶ This should be used to wrap
tf.data.Datasetobjects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for testing), users should wrap each dataset independently. E.g., If users instantiate their training dataset withinbuild_train_spec(), they should calldataset = wrap_dataset(dataset)prior to passing it intotf.estimator.TrainSpec.
-
-
class
determined.estimator.EstimatorExperimentalContext(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)¶ Context class that contains experimental runtime information and features for any Determined workflow that uses the
tf.estimatorAPI.EstimatorExperimentalContextextendsEstimatorTrialContextunder thecontext.experimentalnamespace.-
cache_train_dataset(dataset_id: str, dataset_version: str, shuffle: bool = False, skip_shuffle_at_epoch_end: bool = False) → Callable¶ cache_train_dataset is a decorator for creating your training dataset. It should decorate a function that outputs a
tf.data.Datasetobject. The dataset will be stored in a cache, keyed bydataset_idanddataset_version. The cache is re-used in subsequent calls.- Parameters
dataset_id – A string that will be used as part of the unique identifier for this dataset.
dataset_version – A string that will be used as part of the unique identifier for this dataset.
shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.
skip_shuffle_at_epoch_end – A bool indicating if shuffling should be skipped at the end of epochs.
Example Usage:
def make_train_dataset(self): @self.context.experimental.cache_train_dataset("range_dataset", "v1") def make_dataset(): ds = tf.data.Dataset.range(10) return ds dataset = make_dataset() dataset = dataset.batch(self.context.get_per_slot_batch_size()) dataset = dataset.map(...) return dataset
Note
dataset.batch()and runtime augmentation should be done after caching. Additionally, users should never need to calldataset.repeat().
-
cache_validation_dataset(dataset_id: str, dataset_version: str, shuffle: bool = False) → Callable¶ cache_validation_dataset is a decorator for creating your validation dataset. It should decorate a function that outputs a
tf.data.Datasetobject. The dataset will be stored in a cache, keyed bydataset_idanddataset_version. The cache is re-used in subsequent calls.- Parameters
dataset_id – A string that will be used as part of the unique identifier for this dataset.
dataset_version – A string that will be used as part of the unique identifier for this dataset.
shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.
-