Shortcuts

determined.estimator

determined.estimator.EstimatorTrial

class determined.estimator.EstimatorTrial(context: determined.estimator._estimator_context.EstimatorTrialContext)

By default, experiments run with TensorFlow 1.x. To configure your trial to use TensorFlow 2.x, set a TF 2.x image in the experiment configuration (e.g. determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.1-gpu-0.3.0).

EstimatorTrial supports TF 2.x; however it uses TensorFlow V1 behavior. We have disabled TensorFlow V2 behavior for EstimatorTrial, so there is no need for you to disable it.

trial_context_class

alias of determined.estimator._estimator_context.EstimatorTrialContext

__init__(context: determined.estimator._estimator_context.EstimatorTrialContext)

Initializes a trial using the provided trial_context.

Override this function to initialize any shared state between the estimator, train spec, and/or validation spec.

abstract build_estimator() → tensorflow_estimator.python.estimator.estimator.Estimator

Specifies the tf.estimator.Estimator instance to be used during training and validation. This may be an instance of a Premade Estimator provided by the TensorFlow team, or a Custom Estimator created by the user.

abstract build_train_spec() → tensorflow_estimator.python.estimator.training.TrainSpec

Specifies the tf.estimator.TrainSpec to be used for training steps. This training specification will contain a TensorFlow input_fn which constructs the input data for a training step. Unlike the standard Tensorflow input_fn interface, EstimatorTrial only supports an input_fn that returns a tf.data.Dataset object. A function that returns a tuple of features and labels is currently not supported by EstimatorTrial. Additionally, the max_steps attribute of the training specification will be ignored; instead, the batches_per_step option in the experiment configuration is used to determine how many batches each training step uses.

abstract build_validation_spec() → tensorflow_estimator.python.estimator.training.EvalSpec

Specifies the tf.estimator.EvalSpec to be used for validation steps. This evaluation spec will contain a TensorFlow input_fn which constructs the input data for a validation step. The validation step will evaluate steps batches, or evaluate until the input_fn raises an end-of-input exception if steps is None.

build_serving_input_receiver_fns() → Dict[str, Callable[..., Union[tensorflow_estimator.python.estimator.export.export.ServingInputReceiver, tensorflow_estimator.python.estimator.export.export.TensorServingInputReceiver]]]

Optionally returns a Python dictionary mapping string names to serving_input_receiver_fn s. If specified, each serving input receiver function will be used to export a distinct SavedModel inference graph when a Determined checkpoint is saved, using Estimator.export_saved_model. The exported models are saved under subdirectories named by the keys of the respective serving input receiver functions. For example, returning

{
    "raw": tf.estimator.export.build_raw_serving_input_receiver_fn(...),
    "parsing": tf.estimator.export.build_parsing_serving_input_receiver_fn(...)
}

from this function would configure Determined to export two SavedModel inference graphs in every checkpoint under raw and parsing subdirectories, respectively. By default, this function returns an empty dictionary and the Determined checkpoint directory only contains metadata associated with the training graph.

determined.experimental.estimator.init()

determined.experimental.estimator.init(config: Optional[Dict[str, Any]] = None, mode: determined.experimental._native.Mode = <Mode.CLUSTER: 'cluster'>, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → determined.estimator._estimator_context.EstimatorNativeContext

Create a tf.estimator experiment using the Native API.

Parameters
  • config – A dictionary representing the experiment configuration to be associated with the experiment.

  • mode

    The determined.experimental.Mode used when creating an experiment

    1. Mode.CLUSTER (default): Submit the experiment to a remote Determined cluster.

    2. Mode.LOCAL: Test the experiment in the calling Python process for development / debugging purposes. Run through a minimal loop of training, validation, and checkpointing steps.

  • context_dir

    A string filepath that defines the context directory. All model code will be executed with this as the current working directory.

    In CLUSTER mode, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.

    In LOCAL mode, this argument is optional and assumed to be the current working directory by default.

  • command – A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a python script, this argument is inferred to be sys.argv by default. When executing this function via IPython or Jupyter notebook, this argument is required.

  • master_url – An optional string to use as the Determined master URL in submit mode. Will default to the value of environment variable DET_MASTER if not provided.

Returns

determined.estimator.EstimatorNativeContext

determined.estimator.EstimatorContext

To use tf.estimator models with Determined, users need to wrap their optimizer and datasets using the following functions inherited from determined.estimator.EstimatorContext. Note that the concrete context object where these functions will be found will be either determined.estimator.EstimatorTrialContext or determined.estimator.EstimatorNativeContext, depending on use of Trial API or Native API.

class determined.estimator.EstimatorContext(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)

Base context class that contains runtime information for any Determined workflow that uses the tf.estimator API.

EstimatorTrialContext always has a DistributedContext accessible via context.distributed for information related to distributed training.

EstimatorTrialContext always has a EstimatorExperimentalContext accessible via context.experimental for information related to experimental features.

wrap_optimizer(optimizer: Any) → Any

This should be used to wrap optimizer objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their optimizer. For example, if users create their optimizer within build_estimator(), they should call optimizer = wrap_optimizer(optimzer) prior to passing the optimizer into their Estimator.

wrap_dataset(dataset: Any) → Any

This should be used to wrap tf.data.Dataset objects immediately after they have been created. Users should use the output of this wrapper as the new instance of their dataset. If users create multiple datasets (e.g., one for training and one for testing), users should wrap each dataset independently. E.g., If users instantiate their training dataset within build_train_spec(), they should call dataset = wrap_dataset(dataset) prior to passing it into tf.estimator.TrainSpec.

class determined.estimator.EstimatorExperimentalContext(env: determined._env_context.EnvContext, hvd_config: determined.horovod.HorovodContext)

Context class that contains experimental runtime information and features for any Determined workflow that uses the tf.estimator API.

EstimatorExperimentalContext extends EstimatorTrialContext under the context.experimental namespace.

cache_train_dataset(dataset_id: str, dataset_version: str, shuffle: bool = False, skip_shuffle_at_epoch_end: bool = False) → Callable

cache_train_dataset is a decorator for creating your training dataset. It should decorate a function that outputs a tf.data.Dataset object. The dataset will be stored in a cache, keyed by dataset_id and dataset_version. The cache is re-used in subsequent calls.

Parameters
  • dataset_id – A string that will be used as part of the unique identifier for this dataset.

  • dataset_version – A string that will be used as part of the unique identifier for this dataset.

  • shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.

  • skip_shuffle_at_epoch_end – A bool indicating if shuffling should be skipped at the end of epochs.

Example Usage:

def make_train_dataset(self):
    @self.context.experimental.cache_train_dataset("range_dataset", "v1")
    def make_dataset():
        ds = tf.data.Dataset.range(10)
        return ds

    dataset = make_dataset()
    dataset = dataset.batch(self.context.get_per_slot_batch_size())
    dataset = dataset.map(...)
    return dataset

Note

dataset.batch() and runtime augmentation should be done after caching. Additionally, users should never need to call dataset.repeat().

cache_validation_dataset(dataset_id: str, dataset_version: str, shuffle: bool = False) → Callable

cache_validation_dataset is a decorator for creating your validation dataset. It should decorate a function that outputs a tf.data.Dataset object. The dataset will be stored in a cache, keyed by dataset_id and dataset_version. The cache is re-used in subsequent calls.

Parameters
  • dataset_id – A string that will be used as part of the unique identifier for this dataset.

  • dataset_version – A string that will be used as part of the unique identifier for this dataset.

  • shuffle – A bool indicating if the dataset should be shuffled. Shuffling will be performed with the trial’s random seed which can be set in Experiment Configuration.

Examples