Shortcuts

determined.experimental

TrialReference

class determined.experimental.TrialReference(trial_id: int, user: Optional[str] = None, master: Optional[str] = None, attempt_auth: bool = True)

Trial reference class used for querying relevant det.experimental.Checkpoint instances.

Parameters
  • trial_id (int) – the trial ID.

  • user (string, optional) – the Determined username used for authentication. (default: determined)

  • master (string, optional) – the URL of the determined master. If this argument is not specified environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.

  • attempt_auth (bool, optional) – whether or not to attempt creating a user session. By default, the session will be created in order to query checkpoint information. (default: True)

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the det.experimental.Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters
  • sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.

select_checkpoint(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the det.experimental.Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Exactly one of the best, latest, or uuid parameters must be set.

Parameters
  • latest (bool, optional) – return the most recent checkpoint.

  • best (bool, optional) – return the checkpoint with the best validation metric as defined by the sort_by and smaller_is_better arguments. If sort_by and smaller_is_better are not specified, the values from the associated experiment configuration will be used.

  • uuid (string, optional) – return the checkpoint for the specified uuid.

  • sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.

Checkpoint

class determined.experimental.Checkpoint(uuid: str, storage_config: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: determined_common.api.gql.validations)

Class representing a checkpoint. Contains methods for downloading checkpoints to a local path and loading checkpoints into memory.

The det.experimental.Trial class contains methods that return instances of this class.

download(path: Optional[str] = None) → str

Download checkpoint from the checkpoint storage location locally.

Parameters

path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set the checkpoint will be downloaded to checkpoints/<checkpoint_uuid> relative to the current working directory.

load(path: Optional[str] = None, tags: Optional[List[str]] = None) → Any

Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.

Parameters
  • path (string, optional) – Top level directory to load the checkpoint from. (default: checkpoint/<UUID>)

  • tags (list string, optional) – Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.

static load_from_path(path: str, tags: Optional[List[str]] = None) → Any

Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a pytorch model a torch.nn.Module is returned. If the checkpoint contains a tensorflow saved_model a tensorflow autotrackable object is returned.

Parameters
  • path (string) – Local path to the top level directory of a checkpoint.

  • tags (list string, optional) –

    Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.