Shortcuts

Python API determined.experimental.client

Client

The client module exposes many of the same capabilities as the det CLI tool directly to Python code with an object-oriented interface.

As a simple example, let’s walk through the most basic workflow for creating an experiment, waiting for it to complete, and finding the top-performing checkpoint.

The first step is to import the client module and possibly to call login():

from determined.experimental import client

# We will assume that you have called `det user login`, so this is unnecessary:
# client.login(master=..., user=..., password=...)

The next step is to call create_experiment():

# config can be a path to a config file or a python dict of the config.
exp = client.create_experiment(config="my_config.yaml", model_dir=".")
print(f"started experiment {exp.id}")

The returned object will be an ExperimentReference which has methods for controlling the lifetime of the experiment running on the cluster. In this example, we will just wait for the experiment to complete.

exit_status = exp.wait()
print(f"experiment completed with status {exit_status}")

Now that the experiment has completed, you can grab the top-performing checkpoint from training:

best_checkpoint = exp.top_checkpoint()
print(f"best checkpoint was {best_checkpoint.uuid}")

See Using Checkpoints for more ideas on what to do next.

determined.experimental.client.login(master: Optional[str] = None, user: Optional[str] = None, password: Optional[str] = None, cert_path: Optional[str] = None, cert_name: Optional[str] = None, noverify: bool = False) → None

login() will configure the default Determined() singleton used by all of the other functions in the client module.

It is often unnecessary to call login(). If you have configured your environment so that the Determined CLI works without any extra arguments or environment variables, you should not have to call login() at all.

If you do need to call login(), it must be called before any calling any other functions from this module, otherwise it will fail.

If you have reason to connect to multiple masters, you should use explicit Determined objects instead. Each explicit Determined object accepts the same parameters as login(), and offers the same functions as what are offered in this module.

Note

Try to avoid having your password in your python code. If you are running on your local machine, you should always be able to use det user login on the CLI, and login() will not need either a user or a password. If you have ran det user login with multiple users (and you have not ran det user logout), then you should be able to run login(user=...) for any of those users without putting your password in your code.

Parameters
  • master (string, optional) – The URL of the Determined master. If this argument is not specified, the environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.

  • user (string, optional) – The Determined username used for authentication. (default: determined)

  • password (string, optional) – The password associated with the user.

  • cert_path (string, optional) – A path to a custom PEM-encoded certificate, against which to validate the master. (default: None)

  • cert_name (string, optional) – The name of the master hostname to use during certificate validation. Normally this is taken from the master URL, but there may be cases where the master is exposed on multiple networks that this value might need to be overridden. (default: None)

  • noverify (boolean, optional) – disable all TLS verification entirely. (default: False)

determined.experimental.client.create_experiment(config: Union[str, pathlib.Path, Dict], model_dir: str) → determined.common.experimental.experiment.ExperimentReference

Creates an experiment with config parameters and model directory. The function returns an ExperimentReference of the experiment.

Parameters
  • config (string, pathlib.Path, dictionary) – Experiment config filename (.yaml) or a dict.

  • model_dir (string) – Directory containing model definition.

determined.experimental.client.get_experiment(experiment_id: int) → determined.common.experimental.experiment.ExperimentReference

Get the ExperimentReference representing the experiment with the provided experiment ID.

Parameters

experiment_id (int) – The experiment ID.

determined.experimental.client.get_trial(trial_id: int) → determined.common.experimental.trial.TrialReference

Get the TrialReference representing the trial with the provided trial ID.

Parameters

trial_id (int) – The trial ID.

determined.experimental.client.get_checkpoint(uuid: str) → determined.common.experimental.checkpoint._checkpoint.Checkpoint

Get the Checkpoint representing the checkpoint with the provided UUID.

Parameters

uuid (string) – The checkpoint UUID.

determined.experimental.client.create_model(name: str, description: Optional[str] = '', metadata: Optional[Dict[str, Any]] = None) → determined.common.experimental.model.Model

Add a Model to the model registry. This function returns a Model.

Parameters
  • name (string) – The name of the model. This name must be unique.

  • description (string, optional) – A description of the model.

  • metadata (dict, optional) – Dictionary of metadata to add to the model.

determined.experimental.client.get_model(name: str) → determined.common.experimental.model.Model

Get the Model from the model registry with the provided name. If no model with that name is found in the registry, an exception is raised.

Parameters

name (string) – The name of the model.

determined.experimental.client.get_models(sort_by: determined.common.experimental.model.ModelSortBy = <ModelSortBy.NAME: 1>, order_by: determined.common.experimental.model.ModelOrderBy = <ModelOrderBy.ASCENDING: 1>, name: str = '', description: str = '') → List[determined.common.experimental.model.Model]

Get a list of all models in the model registry.

Parameters
  • sort_by – Which field to sort by. See ModelSortBy.

  • order_by – Whether to sort in ascending or descending order. See ModelOrderBy.

  • name – If this parameter is set, models will be filtered to only include models with names matching this parameter.

  • description – If this parameter is set, models will be filtered to only include models with descriptions matching this parameter.

Checkpoint

class determined.experimental.client.Checkpoint(session: determined.common.experimental.session.Session, uuid: str, experiment_config: Dict[str, Any], experiment_id: int, trial_id: int, hparams: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: Dict[str, Any], metadata: Dict[str, Any], determined_version: Optional[str] = None, framework: Optional[str] = None, format: Optional[str] = None, model_version: Optional[int] = None, model_name: Optional[str] = None)

A Checkpoint object is usually obtained from determined.experimental.client.get_checkpoint().

A Checkpoint represents a trained model.

This class provides helper functionality for downloading checkpoints to local storage and loading checkpoints into memory.

The TrialReference class contains methods that return instances of this class.

download(path: Optional[str] = None) → str

Download checkpoint to local storage.

Parameters

path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set, the checkpoint will be downloaded to checkpoints/<checkpoint_uuid> relative to the current working directory.

load(path: Optional[str] = None, tags: Optional[List[str]] = None, **kwargs: Any) → Any

Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.

Parameters
  • path (string, optional) – Top level directory to load the checkpoint from. (default: checkpoints/<UUID>)

  • tags (list string, optional) – Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.

  • kwargs – Only relevant for PyTorch checkpoints. The keyword arguments will be applied to torch.load. See documentation for torch.load.

add_metadata(metadata: Dict[str, Any]) → None

Adds user-defined metadata to the checkpoint. The metadata argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the checkpoint metadata, the corresponding dictionary entries in the checkpoint are replaced by the passed-in dictionary values.

Parameters

metadata (dict) – Dictionary of metadata to add to the checkpoint.

remove_metadata(keys: List[str]) → None

Removes user-defined metadata from the checkpoint. Any top-level keys that appear in the keys list are removed from the checkpoint.

Parameters

keys (List[string]) – Top-level keys to remove from the checkpoint metadata.

static load_from_path(path: str, tags: Optional[List[str]] = None, **kwargs: Any) → Any

Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a PyTorch model, a torch.nn.Module is returned. If the checkpoint contains a TensorFlow SavedModel, a TensorFlow autotrackable object is returned.

Parameters
  • path (string) – Local path to the checkpoint directory.

  • tags (list string, optional) –

    Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.

Determined

class determined.experimental.client.Determined(master: Optional[str] = None, user: Optional[str] = None, password: Optional[str] = None, cert_path: Optional[str] = None, cert_name: Optional[str] = None, noverify: bool = False)

Determined gives access to Determined API objects.

Parameters
  • master (string, optional) – The URL of the Determined master. If this argument is not specified, the environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.

  • user (string, optional) – The Determined username used for authentication. (default: determined)

create_experiment(config: Union[str, pathlib.Path, Dict], model_dir: Union[str, pathlib.Path]) → determined.common.experimental.experiment.ExperimentReference

Create an experiment with config parameters and model directory. The function returns ExperimentReference of the experiment.

Parameters
  • config (string, pathlib.Path, dictionary) – experiment config filename (.yaml) or a dict.

  • model_dir (string) – directory containing model definition.

get_experiment(experiment_id: int) → determined.common.experimental.experiment.ExperimentReference

Get the ExperimentReference representing the experiment with the provided experiment ID.

get_trial(trial_id: int) → determined.common.experimental.trial.TrialReference

Get the TrialReference representing the trial with the provided trial ID.

get_checkpoint(uuid: str) → determined.common.experimental.checkpoint._checkpoint.Checkpoint

Get the Checkpoint representing the checkpoint with the provided UUID.

create_model(name: str, description: Optional[str] = '', metadata: Optional[Dict[str, Any]] = None) → determined.common.experimental.model.Model

Add a model to the model registry.

Parameters
  • name (string) – The name of the model. This name must be unique.

  • description (string, optional) – A description of the model.

  • metadata (dict, optional) – Dictionary of metadata to add to the model.

get_model(name: str) → determined.common.experimental.model.Model

Get the Model from the model registry with the provided name. If no model with that name is found in the registry, an exception is raised.

get_models(sort_by: determined.common.experimental.model.ModelSortBy = <ModelSortBy.NAME: 1>, order_by: determined.common.experimental.model.ModelOrderBy = <ModelOrderBy.ASCENDING: 1>, name: str = '', description: str = '') → List[determined.common.experimental.model.Model]

Get a list of all models in the model registry.

Parameters
  • sort_by – Which field to sort by. See ModelSortBy.

  • order_by – Whether to sort in ascending or descending order. See ModelOrderBy.

  • name – If this parameter is set, models will be filtered to only include models with names matching this parameter.

  • description – If this parameter is set, models will be filtered to only include models with descriptions matching this parameter.

ExperimentReference

class determined.experimental.client.ExperimentReference(experiment_id: int, session: determined.common.experimental.session.Session)

An ExperimentReference object is usually obtained from determined.experimental.client.create_experiment() or determined.experimental.client.get_experiment().

Helper class that supports querying the set of checkpoints associated with an experiment.

delete() → None

Delete an experiment and all its artifacts from persistent storage.

You must be authenticated as admin to delete an experiment.

wait(interval: int = 5) → determined.common.experimental.experiment.ExperimentState

Wait for the experiment to reach a complete or terminal state.

Parameters

interval (int, optional) – An interval time in seconds before checking next experiement state.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined.common.experimental.checkpoint._checkpoint.Checkpoint

Return the Checkpoint for this experiment that has the best validation metric, as defined by the sort_by and smaller_is_better arguments.

Parameters
  • sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is not specified, the metric defined in the experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.

top_n_checkpoints(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → List[determined.common.experimental.checkpoint._checkpoint.Checkpoint]

Return the N Checkpoint instances with the best validation metrics, as defined by the sort_by and smaller_is_better arguments. This method will return the best checkpoint from the top N best-performing distinct trials of the experiment. Only checkpoints in a COMPLETED state with a matching COMPLETED validation are considered.

Parameters
  • limit (int) – The maximum number of checkpoints to return.

  • sort_by (string, optional) – The name of the validation metric to use for sorting checkpoints. If this parameter is unset, the metric defined in the experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.

Model

class determined.experimental.client.Model(session: determined.common.experimental.session.Session, name: str, description: str = '', creation_time: Optional[datetime.datetime] = None, last_updated_time: Optional[datetime.datetime] = None, metadata: Optional[Dict[str, Any]] = None)

A Model object is usually obtained from determined.experimental.client.create_model() or determined.experimental.client.get_model().

Class representing a model in the model registry. It contains methods for model versions and metadata.

Parameters
  • name (string) – The name of the model.

  • description (string, optional) – The description of the model.

  • creation_time (datetime) – The time the model was created.

  • last_updated_time (datetime) – The time the model was most recently updated.

  • metadata (dict, optional) – User-defined metadata associated with the checkpoint.

  • master (string, optional) – The address of the Determined master instance.

get_version(version: int = 0) → Optional[determined.common.experimental.checkpoint._checkpoint.Checkpoint]

Retrieve the checkpoint corresponding to the specified version of the model. If the specified version of the model does not exist, an exception is raised.

If no version is specified, the latest version of the model is returned. In this case, if there are no registered versions of the model, None is returned.

Parameters

version (int, optional) – The model version number requested.

get_versions(order_by: determined.common.experimental.model.ModelOrderBy = <ModelOrderBy.DESCENDING: 2>) → List[determined.common.experimental.checkpoint._checkpoint.Checkpoint]

Get a list of checkpoints corresponding to versions of this model. The models are sorted by version number and are returned in descending order by default.

Parameters

order_by (enum) – A member of the ModelOrderBy enum.

register_version(checkpoint_uuid: str) → determined.common.experimental.checkpoint._checkpoint.Checkpoint

Creates a new model version and returns the Checkpoint corresponding to the version.

Parameters

checkpoint_uuid – The UUID of the checkpoint to register.

add_metadata(metadata: Dict[str, Any]) → None

Adds user-defined metadata to the model. The metadata argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the model’s metadata, the previous dictionary entries are replaced.

Parameters

metadata (dict) – Dictionary of metadata to add to the model.

remove_metadata(keys: List[str]) → None

Removes user-defined metadata from the model. Any top-level keys that appear in the keys list are removed from the model.

Parameters

keys (List[string]) – Top-level keys to remove from the model metadata.

ModelOrderBy

class determined.experimental.client.ModelOrderBy

Specifies whether a sorted list of models should be in ascending or descending order.

ASCENDING
ASC
DESCENDING
DESC

ModelSortBy

class determined.experimental.client.ModelSortBy

Specifies the field to sort a list of models on.

UNSPECIFIED
NAME
DESCRIPTION
CREATION_TIME
LAST_UPDATED_TIME

TrialReference

class determined.experimental.client.TrialReference(trial_id: int, session: determined.common.experimental.session.Session)

A TrialReference object is usually obtained from determined.experimental.client.get_trial().

Trial reference class used for querying relevant Checkpoint instances.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined.common.experimental.checkpoint._checkpoint.Checkpoint

Return the Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters
  • sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.

select_checkpoint(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined.common.experimental.checkpoint._checkpoint.Checkpoint

Return the Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Exactly one of the best, latest, or uuid parameters must be set.

Parameters
  • latest (bool, optional) – Return the most recent checkpoint.

  • best (bool, optional) – Return the checkpoint with the best validation metric as defined by the sort_by and smaller_is_better arguments. If sort_by and smaller_is_better are not specified, the values from the associated experiment configuration will be used.

  • uuid (string, optional) – Return the checkpoint for the specified UUID.

  • sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default, the value of smaller_is_better from the experiment’s configuration is used.