determined.experimental¶

determined.experimental.create(trial_def: Type[determined._trial.Trial], config: Optional[Dict[str, Any]] = None, local: bool = False, test: bool = False, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → Any¶

Create an experiment.

Parameters

trial_def – A class definition implementing the determined.Trial interface.
config – A dictionary representing the experiment configuration to be associated with the experiment.
local – A boolean indicating if training should be done locally. When False, the experiment will be submitted to the Determined cluster. Defaults to False.
test – A boolean indicating if the experiment should be shortened to a minimal loop of training on a small amount of data, performing validation, and checkpointing. test=True is useful for quick iteration during model porting or debugging because common errors will surface more quickly. Defaults to False.
context_dir –
A string filepath that defines the context directory. All model code will be executed with this as the current working directory.

When local=False, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.

When local=True, this argument is optional and defaults to the current working directory.
command –
A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a Python script, this argument is inferred to be sys.argv by default. When executing this function via IPython or Jupyter notebook, this argument is required.

Example: When creating an experiment by running python train.py --flag value, the default command is inferred as ["train.py", "--flag", "value"].
master_url – An optional string to use as the Determined master URL when local=False. If not specified, will be inferred from the environment variable DET_MASTER.

determined.experimental.create_trial_instance(trial_def: Type[determined._trial.Trial], checkpoint_dir: str, config: Optional[Dict[str, Any]] = None, hparams: Optional[Dict[str, Any]] = None) → determined._trial.Trial¶

Create a trial instance from a Trial class definition. This can be a useful utility for debugging your trial logic in any development environment.

Parameters

trial_def – A class definition that inherits from the det.Trial interface.
checkpoint_dir – The checkpoint directory that the trial will use for loading and saving checkpoints.
config – An optional experiment configuration that is used to initialize the determined.TrialContext. If not specified, a minimal default is used.

`Determined`¶

class determined.experimental.Determined(master: Optional[str] = None, user: Optional[str] = None)¶

Determined gives access to Determined API objects.

Parameters

master (string, optional) – The URL of the Determined master. If this argument is not specified, the environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.
user (string, optional) – The Determined username used for authentication. (default: determined)

get_experiment(experiment_id: int) → determined_common.experimental.experiment.ExperimentReference¶: Get the ExperimentReference representing the experiment with the provided experiment ID.

get_trial(trial_id: int) → determined_common.experimental.trial.TrialReference¶: Get the TrialReference representing the trial with the provided trial ID.

get_checkpoint(uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶: Get the Checkpoint representing the checkpoint with the provided UUID.

create_model(name: str, description: Optional[str] = '', metadata: Optional[Dict[str, Any]] = None) → determined_common.experimental.model.Model¶

Add a model to the registry.

Parameters

name (string) – The name of the model. This name must be unique.
description (string) – A description of the model.
metadata (dict) – Dictionary of metadata to add to the model.

get_model(name: str) → determined_common.experimental.model.Model¶: Get the Model representing the model with the provided name.

get_models(sort_by: determined_common.experimental.model.ModelSortBy = <ModelSortBy.NAME: 1>, order_by: determined_common.experimental.model.ModelOrderBy = <ModelOrderBy.ASCENDING: 1>, name: str = '', description: str = '') → List[determined_common.experimental.model.Model]¶

Get a list of all models in the model registry.

Parameters

sort_by – Which field to sort by. See ModelSortBy.
order_by – Whether to sort in ascending or descending order. See ModelOrderBy.
name – If this parameter is set, models will be filtered to only include models with names matching this parameter.
description – If this parameter is set, models will be filtered to only include models with descriptions matching this parameter.

`Model`¶

class determined.experimental.Model(name: str, description: str = '', creation_time: Optional[datetime.datetime] = None, last_updated_time: Optional[datetime.datetime] = None, metadata: Optional[Dict[str, Any]] = None, master: str = '')¶

Class representing a model. Contains methods for managing metadata and model versions.

Parameters

name (string) – The name of the model.
description (string, optional) – The description of the model.
creation_time (datetime) – The time the model was created.
last_updated_time (datetime) – The time the model was most recently updated.
metadata (dict, optional) – User defined metadata associated with the checkpoint.
master (string, optional) – The address of the Determined master instance.

get_version(version: int = 0) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶

Retrieve the checkpoint corresponding to the specified version of the model. If no version is specified the latest model version is returned.

Parameters: version (int, optional) – The model version number requested.

get_versions(order_by: determined_common.experimental.model.ModelOrderBy = <ModelOrderBy.DESCENDING: 2>) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]¶

Get a list of checkpoints corresponding to versions of this model. The models are sorted by version number and are returned in descending order by default.

Parameters: order_by (enum) – A member of the ModelOrderBy enum.

register_version(checkpoint_uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶

Creats a new model version and returns the Checkpoint corresponding to the version.

Parameters: checkpoint_uuid – The uuid to associated with the new model version.

add_metadata(metadata: Dict[str, Any]) → None¶

Adds user-defined metadata to the model. The metadata argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the model metadata, the corresponding dictionary entries in the model are replaced by the passed-in dictionary values.

Parameters: metadata (dict) – Dictionary of metadata to add to the model.

remove_metadata(keys: List[str]) → None¶

Removes user-defined metadata from the model. Any top-level keys that appear in the keys list are removed from the model.

Parameters: keys (List[string]) – Top-level keys to remove from the model metadata.

`ExperimentReference`¶

class determined.experimental.ExperimentReference(experiment_id: int, master: str)¶

Experiment reference class used for querying relevant Checkpoint instances.

Parameters

experiment_id (int) – The experiment ID.
master (string, optional) – The URL of the Determined master. If this class is obtained via determined.experimental.Determined, the master URL is automatically passed into this constructor.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶

Return the Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters

sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.

top_n_checkpoints(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]¶

Return the N Checkpoint instances with the best validation metric values as defined by the sort_by and smaller_is_better arguments. This method will return the best checkpoint from the top N performing distinct trials of the experiment.

Parameters

sort_by (string, optional) – The name of the validation metric to use for sorting checkpoints. If this parameter is unset, the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.

`TrialReference`¶

class determined.experimental.TrialReference(trial_id: int, master: str)¶

Trial reference class used for querying relevant Checkpoint instances.

Parameters

trial_id (int) – the trial ID.
master (string, optional) – The URL of the Determined master. If this class is obtained via determined.experimental.Determined, the master URL is automatically passed into this constructor.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶

Return the Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters

sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.

select_checkpoint(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶

Return the Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Exactly one of the best, latest, or uuid parameters must be set.

Parameters

latest (bool, optional) – return the most recent checkpoint.
best (bool, optional) – return the checkpoint with the best validation metric as defined by the sort_by and smaller_is_better arguments. If sort_by and smaller_is_better are not specified, the values from the associated experiment configuration will be used.
uuid (string, optional) – return the checkpoint for the specified UUID.
sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.

`Checkpoint`¶

class determined.experimental.Checkpoint(uuid: str, experiment_config: Dict[str, Any], experiment_id: int, trial_id: int, hparams: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: Dict[str, Any], determined_version: Optional[str] = None, framework: Optional[str] = None, format: Optional[str] = None, version: Optional[int] = None, model_name: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None, master: Optional[str] = None)¶

Class representing a checkpoint. Contains methods for downloading checkpoints to local storage and loading checkpoints into memory.

The TrialReference class contains methods that return instances of this class.

download(path: Optional[str] = None) → str¶

Download checkpoint to local storage.

Parameters: path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set, the checkpoint will be downloaded to checkpoints/<checkpoint_uuid> relative to the current working directory.

load(path: Optional[str] = None, tags: Optional[List[str]] = None, **kwargs: Any) → Any¶

Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.

Parameters

path (string, optional) – Top level directory to load the checkpoint from. (default: checkpoints/<UUID>)
tags (list string, optional) – Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.
kwargs – Only relevant for PyTorch checkpoints. The keyword arguments will be applied to torch.load. See documentation for torch.load.

add_metadata(metadata: Dict[str, Any]) → None¶

Adds user-defined metadata to the checkpoint. The metadata argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the checkpoint metadata, the corresponding dictionary entries in the checkpoint are replaced by the passed-in dictionary values.

Parameters: metadata (dict) – Dictionary of metadata to add to the checkpoint.

remove_metadata(keys: List[str]) → None¶

Removes user-defined metadata from the checkpoint. Any top-level keys that appear in the keys list are removed from the checkpoint.

Parameters: keys (List[string]) – Top-level keys to remove from the checkpoint metadata.

static load_from_path(path: str, tags: Optional[List[str]] = None, **kwargs: Any) → Any¶

Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a PyTorch model, a torch.nn.Module is returned. If the checkpoint contains a TensorFlow SavedModel, a TensorFlow autotrackable object is returned.

Parameters

path (string) – Local path to the checkpoint directory.
tags (list string, optional) –
Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.

determined.experimental¶

Determined¶

Model¶

ExperimentReference¶

TrialReference¶

Checkpoint¶

`Determined`¶

`Model`¶

`ExperimentReference`¶

`TrialReference`¶

`Checkpoint`¶