determined.experimental¶
TrialReference
¶
-
class
determined.experimental.
TrialReference
(trial_id: int, user: Optional[str] = None, master: Optional[str] = None, attempt_auth: bool = True)¶ Trial reference class used for querying relevant
det.experimental.Checkpoint
instances.- Parameters
trial_id (int) – the trial ID.
user (string, optional) – the Determined username used for authentication. (default:
determined
)master (string, optional) – the URL of the determined master. If this argument is not specified environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.
attempt_auth (bool, optional) – whether or not to attempt creating a user session. By default, the session will be created in order to query checkpoint information. (default:
True
)
-
top_checkpoint
(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
det.experimental.Checkpoint
instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.- Parameters
sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.
-
select_checkpoint
(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
det.experimental.Checkpoint
instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.Exactly one of the best, latest, or uuid parameters must be set.
- Parameters
latest (bool, optional) – return the most recent checkpoint.
best (bool, optional) – return the checkpoint with the best validation metric as defined by the sort_by and smaller_is_better arguments. If sort_by and smaller_is_better are not specified, the values from the associated experiment configuration will be used.
uuid (string, optional) – return the checkpoint for the specified uuid.
sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.
Checkpoint
¶
-
class
determined.experimental.
Checkpoint
(uuid: str, storage_config: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: determined_common.api.gql.validations)¶ Class representing a checkpoint. Contains methods for downloading checkpoints to a local path and loading checkpoints into memory.
The
det.experimental.Trial
class contains methods that return instances of this class.-
download
(path: Optional[str] = None) → str¶ Download checkpoint from the checkpoint storage location locally.
- Parameters
path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set the checkpoint will be downloaded to checkpoints/<checkpoint_uuid> relative to the current working directory.
-
load
(path: Optional[str] = None, tags: Optional[List[str]] = None) → Any¶ Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.
- Parameters
path (string, optional) – Top level directory to load the checkpoint from. (default:
checkpoint/<UUID>
)tags (list string, optional) – Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.
-
static
load_from_path
(path: str, tags: Optional[List[str]] = None) → Any¶ Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a pytorch model a
torch.nn.Module
is returned. If the checkpoint contains a tensorflow saved_model a tensorflow autotrackable object is returned.- Parameters
path (string) – Local path to the top level directory of a checkpoint.
tags (list string, optional) –
Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.
-