Shortcuts

determined.experimental

determined.experimental.create(trial_def: Type[determined._trial.Trial], config: Optional[Dict[str, Any]] = None, local: bool = False, test: bool = False, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → Any

Create an experiment.

Parameters
  • trial_def – A class definition implementing the determined.Trial interface.

  • config – A dictionary representing the experiment configuration to be associated with the experiment.

  • local – A boolean indicating if training should be done locally. When False, the experiment will be submitted to the Determined cluster. Defaults to False.

  • test – A boolean indicating if the experiment should be shortened to a minimal loop of training on a small amount of data, performing validation, and checkpointing. test=True is useful for quick iteration during model porting or debugging because common errors will surface more quickly. Defaults to False.

  • context_dir

    A string filepath that defines the context directory. All model code will be executed with this as the current working directory.

    When local=False, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.

    When local=True, this argument is optional and defaults to the current working directory.

  • command

    A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a Python script, this argument is inferred to be sys.argv by default. When executing this function via IPython or Jupyter notebook, this argument is required.

    Example: When creating an experiment by running python train.py --flag value, the default command is inferred as ["train.py", "--flag", "value"].

  • master_url – An optional string to use as the Determined master URL when local=False. If not specified, will be inferred from the environment variable DET_MASTER.

determined.experimental.create_trial_instance(trial_def: Type[determined._trial.Trial], checkpoint_dir: str, config: Optional[Dict[str, Any]] = None, hparams: Optional[Dict[str, Any]] = None) → determined._trial.Trial

Create a trial instance from a Trial class definition. This can be a useful utility for debugging your trial logic in any development environment.

Parameters
  • trial_def – A class definition that inherits from the det.Trial interface.

  • checkpoint_dir – The checkpoint directory that the trial will use for loading and saving checkpoints.

  • config – An optional experiment configuration that is used to initialize the determined.TrialContext. If not specified, a minimal default is used.

Determined

class determined.experimental.Determined(master: Optional[str] = None, user: Optional[str] = None)

Determined gives access to Determined API objects.

Parameters
  • master (string, optional) – The URL of the Determined master. If this argument is not specified, the environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.

  • user (string, optional) – The Determined username used for authentication. (default: determined)

get_experiment(experiment_id: int) → determined_common.experimental.experiment.ExperimentReference

Get the ExperimentReference representing the experiment with the provided experiment ID.

get_trial(trial_id: int) → determined_common.experimental.trial.TrialReference

Get the TrialReference representing the trial with the provided trial ID.

get_checkpoint(uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Get the Checkpoint representing the checkpoint with the provided UUID.

create_model(name: str, description: Optional[str] = '', metadata: Optional[Dict[str, Any]] = None) → determined_common.experimental.model.Model

Add a model to the registry.

Parameters
  • name (string) – The name of the model. This name must be unique.

  • description (string) – A description of the model.

  • metadata (dict) – Dictionary of metadata to add to the model.

get_model(name: str) → determined_common.experimental.model.Model

Get the Model representing the model with the provided name.

get_models(sort_by: determined_common.experimental.model.ModelSortBy = <ModelSortBy.NAME: 1>, order_by: determined_common.experimental.model.ModelOrderBy = <ModelOrderBy.ASCENDING: 1>, name: str = '', description: str = '') → List[determined_common.experimental.model.Model]

Get a list of all models in the model registry.

Parameters
  • sort_by – Which field to sort by. See ModelSortBy.

  • order_by – Whether to sort in ascending or descending order. See ModelOrderBy.

  • name – If this parameter is set, models will be filtered to only include models with names matching this parameter.

  • description – If this parameter is set, models will be filtered to only include models with descriptions matching this parameter.

Model

class determined.experimental.Model(name: str, description: str = '', creation_time: Optional[datetime.datetime] = None, last_updated_time: Optional[datetime.datetime] = None, metadata: Optional[Dict[str, Any]] = None, master: str = '')

Class representing a model. Contains methods for managing metadata and model versions.

Parameters
  • name (string) – The name of the model.

  • description (string, optional) – The description of the model.

  • creation_time (datetime) – The time the model was created.

  • last_updated_time (datetime) – The time the model was most recently updated.

  • metadata (dict, optional) – User defined metadata associated with the checkpoint.

  • master (string, optional) – The address of the Determined master instance.

get_version(version: int = 0) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Retrieve the checkpoint corresponding to the specified version of the model. If no version is specified the latest model version is returned.

Parameters

version (int, optional) – The model version number requested.

get_versions(order_by: determined_common.experimental.model.ModelOrderBy = <ModelOrderBy.DESCENDING: 2>) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]

Get a list of checkpoints corresponding to versions of this model. The models are sorted by version number and are returned in descending order by default.

Parameters

order_by (enum) – A member of the ModelOrderBy enum.

register_version(checkpoint_uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Creats a new model version and returns the Checkpoint corresponding to the version.

Parameters

checkpoint_uuid – The uuid to associated with the new model version.

add_metadata(metadata: Dict[str, Any]) → None

Adds user-defined metadata to the model. The metadata argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the model metadata, the corresponding dictionary entries in the model are replaced by the passed-in dictionary values.

Parameters

metadata (dict) – Dictionary of metadata to add to the model.

remove_metadata(keys: List[str]) → None

Removes user-defined metadata from the model. Any top-level keys that appear in the keys list are removed from the model.

Parameters

keys (List[string]) – Top-level keys to remove from the model metadata.

ExperimentReference

class determined.experimental.ExperimentReference(experiment_id: int, master: str)

Experiment reference class used for querying relevant Checkpoint instances.

Parameters
  • experiment_id (int) – The experiment ID.

  • master (string, optional) – The URL of the Determined master. If this class is obtained via determined.experimental.Determined, the master URL is automatically passed into this constructor.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters
  • sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.

top_n_checkpoints(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]

Return the N Checkpoint instances with the best validation metric values as defined by the sort_by and smaller_is_better arguments. This method will return the best checkpoint from the top N performing distinct trials of the experiment.

Parameters
  • sort_by (string, optional) – The name of the validation metric to use for sorting checkpoints. If this parameter is unset, the metric defined in the experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.

TrialReference

class determined.experimental.TrialReference(trial_id: int, master: str)

Trial reference class used for querying relevant Checkpoint instances.

Parameters
  • trial_id (int) – the trial ID.

  • master (string, optional) – The URL of the Determined master. If this class is obtained via determined.experimental.Determined, the master URL is automatically passed into this constructor.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters
  • sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.

select_checkpoint(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Exactly one of the best, latest, or uuid parameters must be set.

Parameters
  • latest (bool, optional) – return the most recent checkpoint.

  • best (bool, optional) – return the checkpoint with the best validation metric as defined by the sort_by and smaller_is_better arguments. If sort_by and smaller_is_better are not specified, the values from the associated experiment configuration will be used.

  • uuid (string, optional) – return the checkpoint for the specified UUID.

  • sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.

Checkpoint

class determined.experimental.Checkpoint(uuid: str, experiment_config: Dict[str, Any], experiment_id: int, trial_id: int, hparams: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: Dict[str, Any], determined_version: Optional[str] = None, framework: Optional[str] = None, format: Optional[str] = None, version: Optional[int] = None, model_name: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None, master: Optional[str] = None)

Class representing a checkpoint. Contains methods for downloading checkpoints to local storage and loading checkpoints into memory.

The TrialReference class contains methods that return instances of this class.

download(path: Optional[str] = None) → str

Download checkpoint to local storage.

Parameters

path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set, the checkpoint will be downloaded to checkpoints/<checkpoint_uuid> relative to the current working directory.

load(path: Optional[str] = None, tags: Optional[List[str]] = None, **kwargs: Any) → Any

Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.

Parameters
  • path (string, optional) – Top level directory to load the checkpoint from. (default: checkpoints/<UUID>)

  • tags (list string, optional) – Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.

  • kwargs – Only relevant for PyTorch checkpoints. The keyword arguments will be applied to torch.load. See documentation for torch.load.

add_metadata(metadata: Dict[str, Any]) → None

Adds user-defined metadata to the checkpoint. The metadata argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the checkpoint metadata, the corresponding dictionary entries in the checkpoint are replaced by the passed-in dictionary values.

Parameters

metadata (dict) – Dictionary of metadata to add to the checkpoint.

remove_metadata(keys: List[str]) → None

Removes user-defined metadata from the checkpoint. Any top-level keys that appear in the keys list are removed from the checkpoint.

Parameters

keys (List[string]) – Top-level keys to remove from the checkpoint metadata.

static load_from_path(path: str, tags: Optional[List[str]] = None, **kwargs: Any) → Any

Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a PyTorch model, a torch.nn.Module is returned. If the checkpoint contains a TensorFlow SavedModel, a TensorFlow autotrackable object is returned.

Parameters
  • path (string) – Local path to the checkpoint directory.

  • tags (list string, optional) –

    Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.