determined.experimental¶
-
determined.experimental.
create
(trial_def: Type[determined._trial.Trial], config: Optional[Dict[str, Any]] = None, local: bool = False, test: bool = False, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → Any¶ Create an experiment.
- Parameters
trial_def – A class definition implementing the
determined.Trial
interface.config – A dictionary representing the experiment configuration to be associated with the experiment.
local – A boolean indicating if training should be done locally. When
False
, the experiment will be submitted to the Determined cluster. Defaults toFalse
.test – A boolean indicating if the experiment should be shortened to a minimal loop of training on a small amount of data, performing validation, and checkpointing.
test=True
is useful for quick iteration during model porting or debugging because common errors will surface more quickly. Defaults toFalse
.context_dir –
A string filepath that defines the context directory. All model code will be executed with this as the current working directory.
When
local=False
, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.When
local=True
, this argument is optional and defaults to the current working directory.command –
A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a Python script, this argument is inferred to be
sys.argv
by default. When executing this function via IPython or Jupyter notebook, this argument is required.Example: When creating an experiment by running
python train.py --flag value
, the default command is inferred as["train.py", "--flag", "value"]
.master_url – An optional string to use as the Determined master URL when
local=False
. If not specified, will be inferred from the environment variableDET_MASTER
.
-
determined.experimental.
create_trial_instance
(trial_def: Type[determined._trial.Trial], checkpoint_dir: str, config: Optional[Dict[str, Any]] = None, hparams: Optional[Dict[str, Any]] = None) → determined._trial.Trial¶ Create a trial instance from a Trial class definition. This can be a useful utility for debugging your trial logic in any development environment.
- Parameters
trial_def – A class definition that inherits from the det.Trial interface.
checkpoint_dir – The checkpoint directory that the trial will use for loading and saving checkpoints.
config – An optional experiment configuration that is used to initialize the
determined.TrialContext
. If not specified, a minimal default is used.
Determined
¶
-
class
determined.experimental.
Determined
(master: Optional[str] = None, user: Optional[str] = None)¶ Determined gives access to Determined API objects.
- Parameters
master (string, optional) – The URL of the Determined master. If this argument is not specified, the environment variables
DET_MASTER
andDET_MASTER_ADDR
will be checked for the master URL in that order.user (string, optional) – The Determined username used for authentication. (default:
determined
)
-
get_experiment
(experiment_id: int) → determined_common.experimental.experiment.ExperimentReference¶ Get the
ExperimentReference
representing the experiment with the provided experiment ID.
-
get_trial
(trial_id: int) → determined_common.experimental.trial.TrialReference¶ Get the
TrialReference
representing the trial with the provided trial ID.
-
get_checkpoint
(uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Get the
Checkpoint
representing the checkpoint with the provided UUID.
-
create_model
(name: str, description: Optional[str] = '', metadata: Optional[Dict[str, Any]] = None) → determined_common.experimental.model.Model¶ Add a model to the registry.
- Parameters
name (string) – The name of the model. This name must be unique.
description (string) – A description of the model.
metadata (dict) – Dictionary of metadata to add to the model.
-
get_model
(name: str) → determined_common.experimental.model.Model¶ Get the
Model
representing the model with the provided name.
-
get_models
(sort_by: determined_common.experimental.model.ModelSortBy = <ModelSortBy.NAME: 1>, order_by: determined_common.experimental.model.ModelOrderBy = <ModelOrderBy.ASCENDING: 1>, name: str = '', description: str = '') → List[determined_common.experimental.model.Model]¶ Get a list of all models in the model registry.
- Parameters
sort_by – Which field to sort by. See
ModelSortBy
.order_by – Whether to sort in ascending or descending order. See
ModelOrderBy
.name – If this parameter is set, models will be filtered to only include models with names matching this parameter.
description – If this parameter is set, models will be filtered to only include models with descriptions matching this parameter.
Model
¶
-
class
determined.experimental.
Model
(name: str, description: str = '', creation_time: Optional[datetime.datetime] = None, last_updated_time: Optional[datetime.datetime] = None, metadata: Optional[Dict[str, Any]] = None, master: str = '')¶ Class representing a model. Contains methods for managing metadata and model versions.
- Parameters
name (string) – The name of the model.
description (string, optional) – The description of the model.
creation_time (datetime) – The time the model was created.
last_updated_time (datetime) – The time the model was most recently updated.
metadata (dict, optional) – User defined metadata associated with the checkpoint.
master (string, optional) – The address of the Determined master instance.
-
get_version
(version: int = 0) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Retrieve the checkpoint corresponding to the specified version of the model. If no version is specified the latest model version is returned.
- Parameters
version (int, optional) – The model version number requested.
-
get_versions
(order_by: determined_common.experimental.model.ModelOrderBy = <ModelOrderBy.DESCENDING: 2>) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]¶ Get a list of checkpoints corresponding to versions of this model. The models are sorted by version number and are returned in descending order by default.
- Parameters
order_by (enum) – A member of the ModelOrderBy enum.
-
register_version
(checkpoint_uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Creats a new model version and returns the
Checkpoint
corresponding to the version.- Parameters
checkpoint_uuid – The uuid to associated with the new model version.
-
add_metadata
(metadata: Dict[str, Any]) → None¶ Adds user-defined metadata to the model. The
metadata
argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the model metadata, the corresponding dictionary entries in the model are replaced by the passed-in dictionary values.- Parameters
metadata (dict) – Dictionary of metadata to add to the model.
-
remove_metadata
(keys: List[str]) → None¶ Removes user-defined metadata from the model. Any top-level keys that appear in the
keys
list are removed from the model.- Parameters
keys (List[string]) – Top-level keys to remove from the model metadata.
ExperimentReference
¶
-
class
determined.experimental.
ExperimentReference
(experiment_id: int, master: str)¶ Experiment reference class used for querying relevant
Checkpoint
instances.- Parameters
experiment_id (int) – The experiment ID.
master (string, optional) – The URL of the Determined master. If this class is obtained via
determined.experimental.Determined
, the master URL is automatically passed into this constructor.
-
top_checkpoint
(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
Checkpoint
instance with the best validation metric as defined by thesort_by
andsmaller_is_better
arguments.- Parameters
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.
-
top_n_checkpoints
(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]¶ Return the N
Checkpoint
instances with the best validation metric values as defined by thesort_by
andsmaller_is_better
arguments. This method will return the best checkpoint from the top N performing distinct trials of the experiment.- Parameters
sort_by (string, optional) – The name of the validation metric to use for sorting checkpoints. If this parameter is unset, the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
TrialReference
¶
-
class
determined.experimental.
TrialReference
(trial_id: int, master: str)¶ Trial reference class used for querying relevant
Checkpoint
instances.- Parameters
trial_id (int) – the trial ID.
master (string, optional) – The URL of the Determined master. If this class is obtained via
determined.experimental.Determined
, the master URL is automatically passed into this constructor.
-
top_checkpoint
(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
Checkpoint
instance with the best validation metric as defined by thesort_by
andsmaller_is_better
arguments.- Parameters
sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
-
select_checkpoint
(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
Checkpoint
instance with the best validation metric as defined by thesort_by
andsmaller_is_better
arguments.Exactly one of the
best
,latest
, oruuid
parameters must be set.- Parameters
latest (bool, optional) – return the most recent checkpoint.
best (bool, optional) – return the checkpoint with the best validation metric as defined by the
sort_by
andsmaller_is_better
arguments. Ifsort_by
andsmaller_is_better
are not specified, the values from the associated experiment configuration will be used.uuid (string, optional) – return the checkpoint for the specified UUID.
sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
Checkpoint
¶
-
class
determined.experimental.
Checkpoint
(uuid: str, experiment_config: Dict[str, Any], experiment_id: int, trial_id: int, hparams: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: Dict[str, Any], determined_version: Optional[str] = None, framework: Optional[str] = None, format: Optional[str] = None, version: Optional[int] = None, model_name: Optional[str] = None, metadata: Optional[Dict[str, Any]] = None, master: Optional[str] = None)¶ Class representing a checkpoint. Contains methods for downloading checkpoints to local storage and loading checkpoints into memory.
The
TrialReference
class contains methods that return instances of this class.-
download
(path: Optional[str] = None) → str¶ Download checkpoint to local storage.
- Parameters
path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set, the checkpoint will be downloaded to
checkpoints/<checkpoint_uuid>
relative to the current working directory.
-
load
(path: Optional[str] = None, tags: Optional[List[str]] = None, **kwargs: Any) → Any¶ Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.
- Parameters
path (string, optional) – Top level directory to load the checkpoint from. (default:
checkpoints/<UUID>
)tags (list string, optional) – Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.
kwargs – Only relevant for PyTorch checkpoints. The keyword arguments will be applied to
torch.load
. See documentation for torch.load.
-
add_metadata
(metadata: Dict[str, Any]) → None¶ Adds user-defined metadata to the checkpoint. The
metadata
argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the checkpoint metadata, the corresponding dictionary entries in the checkpoint are replaced by the passed-in dictionary values.- Parameters
metadata (dict) – Dictionary of metadata to add to the checkpoint.
-
remove_metadata
(keys: List[str]) → None¶ Removes user-defined metadata from the checkpoint. Any top-level keys that appear in the
keys
list are removed from the checkpoint.- Parameters
keys (List[string]) – Top-level keys to remove from the checkpoint metadata.
-
static
load_from_path
(path: str, tags: Optional[List[str]] = None, **kwargs: Any) → Any¶ Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a PyTorch model, a
torch.nn.Module
is returned. If the checkpoint contains a TensorFlow SavedModel, a TensorFlow autotrackable object is returned.- Parameters
path (string) – Local path to the checkpoint directory.
tags (list string, optional) –
Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.
-