determined.experimental¶
-
determined.experimental.
create
(trial_def: Type[determined._trial.Trial], config: Optional[Dict[str, Any]] = None, local: bool = False, test: bool = False, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → Any¶ Create an experiment.
- Parameters
trial_def – A class definition implementing the
determined.Trial
interface.config – A dictionary representing the experiment configuration to be associated with the experiment.
local – A boolean indicating if training should be done locally. When
False
, the experiment will be submitted to the Determined cluster. Defaults toFalse
.test – A boolean indicating if the experiment should be shortened to a minimal loop of training on a small amount of data, performing validation, and checkpointing.
test=True
is useful for quick iteration during model porting or debugging because common errors will surface more quickly. Defaults toFalse
.context_dir –
A string filepath that defines the context directory. All model code will be executed with this as the current working directory.
When
local=False
, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.When
local=True
, this argument is optional and defaults to the current working directory.command –
A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a Python script, this argument is inferred to be
sys.argv
by default. When executing this function via IPython or Jupyter notebook, this argument is required.Example: When creating an experiment by running
python train.py --flag value
, the default command is inferred as["train.py", "--flag", "value"]
.master_url – An optional string to use as the Determined master URL when
local=False
. If not specified, will be inferred from the environment variableDET_MASTER
.
-
determined.experimental.
create_trial_instance
(trial_def: Type[determined._trial.Trial], checkpoint_dir: str, config: Optional[Dict[str, Any]] = None, hparams: Optional[Dict[str, Any]] = None) → determined._trial.Trial¶ Create a trial instance from a Trial class definition. This can be a useful utility for debugging your trial logic in any development environment.
- Parameters
trial_def – A class definition that inherits from the det.Trial interface.
checkpoint_dir – The checkpoint directory that the trial will use for loading and saving checkpoints.
config – An optional experiment configuration that is used to initialize the
determined.TrialContext
. If not specified, a minimal default is used.
Checkpoint
¶
-
class
determined.experimental.
Checkpoint
(uuid: str, experiment_config: Dict[str, Any], experiment_id: int, trial_id: int, hparams: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: Dict[str, Any], metadata: Dict[str, Any], determined_version: Optional[str] = None, framework: Optional[str] = None, format: Optional[str] = None, model_version: Optional[int] = None, model_name: Optional[str] = None, master: Optional[str] = None)¶ A
Checkpoint
represents a trained model.This class provides helper functionality for downloading checkpoints to local storage and loading checkpoints into memory.
The
TrialReference
class contains methods that return instances of this class.- Parameters
uuid (string) – UUID of the checkpoint.
experiment_config (dict) – The configuration of the experiment that created the checkpoint.
experiment_id (int) – The ID of the experiment that created the checkpoint.
trial_id (int) – The ID of the trial that created the checkpoint.
hparams (dict) – Hyperparameter values for the trial that created the checkpoint.
batch_number (int) – Batch number during training when the checkpoint was taken.
start_time (string) – Timestamp when the checkpoint began being saved to persistent storage.
end_time (string) – Timestamp when the checkpoint completed being saved to persistent storage.
resources (dict) – Dictionary of file paths to file sizes (in bytes) of all files in the checkpoint.
validation (dict) – Dictionary of validation metric names to their values.
framework (string, optional) – The framework of the trial i.e., tensorflow, torch.
format (string, optional) – The format of the checkpoint i.e., h5, saved_model, pickle.
determined_version (str, optional) – The version of Determined the checkpoint was taken with.
metadata (dict, optional) – User defined metadata associated with the checkpoint.
master (string, optional) – The address of the Determined master instance.
-
download
(path: Optional[str] = None) → str¶ Download checkpoint to local storage.
- Parameters
path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set, the checkpoint will be downloaded to
checkpoints/<checkpoint_uuid>
relative to the current working directory.
-
load
(path: Optional[str] = None, tags: Optional[List[str]] = None, **kwargs: Any) → Any¶ Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.
- Parameters
path (string, optional) – Top level directory to load the checkpoint from. (default:
checkpoints/<UUID>
)tags (list string, optional) – Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.
kwargs – Only relevant for PyTorch checkpoints. The keyword arguments will be applied to
torch.load
. See documentation for torch.load.
-
add_metadata
(metadata: Dict[str, Any]) → None¶ Adds user-defined metadata to the checkpoint. The
metadata
argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the checkpoint metadata, the corresponding dictionary entries in the checkpoint are replaced by the passed-in dictionary values.- Parameters
metadata (dict) – Dictionary of metadata to add to the checkpoint.
-
remove_metadata
(keys: List[str]) → None¶ Removes user-defined metadata from the checkpoint. Any top-level keys that appear in the
keys
list are removed from the checkpoint.- Parameters
keys (List[string]) – Top-level keys to remove from the checkpoint metadata.
-
static
load_from_path
(path: str, tags: Optional[List[str]] = None, **kwargs: Any) → Any¶ Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a PyTorch model, a
torch.nn.Module
is returned. If the checkpoint contains a TensorFlow SavedModel, a TensorFlow autotrackable object is returned.- Parameters
path (string) – Local path to the checkpoint directory.
tags (list string, optional) –
Only relevant for TensorFlow SavedModel checkpoints. Specifies which tags are loaded from the TensorFlow SavedModel. See documentation for tf.compat.v1.saved_model.load_v2.
Determined
¶
-
class
determined.experimental.
Determined
(master: Optional[str] = None, user: Optional[str] = None)¶ Determined gives access to Determined API objects.
- Parameters
master (string, optional) – The URL of the Determined master. If this argument is not specified, the environment variables
DET_MASTER
andDET_MASTER_ADDR
will be checked for the master URL in that order.user (string, optional) – The Determined username used for authentication. (default:
determined
)
-
get_experiment
(experiment_id: int) → determined_common.experimental.experiment.ExperimentReference¶ Get the
ExperimentReference
representing the experiment with the provided experiment ID.
-
get_trial
(trial_id: int) → determined_common.experimental.trial.TrialReference¶ Get the
TrialReference
representing the trial with the provided trial ID.
-
get_checkpoint
(uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Get the
Checkpoint
representing the checkpoint with the provided UUID.
-
create_model
(name: str, description: Optional[str] = '', metadata: Optional[Dict[str, Any]] = None) → determined_common.experimental.model.Model¶ Add a model to the model registry.
- Parameters
name (string) – The name of the model. This name must be unique.
description (string, optional) – A description of the model.
metadata (dict, optional) – Dictionary of metadata to add to the model.
-
get_model
(name: str) → determined_common.experimental.model.Model¶ Get the
Model
from the model registry with the provided name. If no model with that name is found in the registry, an exception is raised.
-
get_models
(sort_by: determined_common.experimental.model.ModelSortBy = <ModelSortBy.NAME: 1>, order_by: determined_common.experimental.model.ModelOrderBy = <ModelOrderBy.ASCENDING: 1>, name: str = '', description: str = '') → List[determined_common.experimental.model.Model]¶ Get a list of all models in the model registry.
- Parameters
sort_by – Which field to sort by. See
ModelSortBy
.order_by – Whether to sort in ascending or descending order. See
ModelOrderBy
.name – If this parameter is set, models will be filtered to only include models with names matching this parameter.
description – If this parameter is set, models will be filtered to only include models with descriptions matching this parameter.
ExperimentReference
¶
-
class
determined.experimental.
ExperimentReference
(experiment_id: int, master: str)¶ Helper class that supports querying the set of checkpoints associated with an experiment.
- Parameters
experiment_id (int) – The ID of this experiment.
master (string, optional) – The URL of the Determined master. If this class is obtained via
Determined
, the master URL is automatically passed into this constructor.
-
top_checkpoint
(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
Checkpoint
for this experiment that has the best validation metric, as defined by thesort_by
andsmaller_is_better
arguments.- Parameters
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is not specified, the metric defined in the experiment configuration
searcher
field will be used.smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
-
top_n_checkpoints
(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]¶ Return the N
Checkpoint
instances with the best validation metrics, as defined by thesort_by
andsmaller_is_better
arguments. This method will return the best checkpoint from the top N best-performing distinct trials of the experiment. Only checkpoints in aCOMPLETED
state with a matchingCOMPLETED
validation are considered.- Parameters
limit (int) – The maximum number of checkpoints to return.
sort_by (string, optional) – The name of the validation metric to use for sorting checkpoints. If this parameter is unset, the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
Model
¶
-
class
determined.experimental.
Model
(name: str, description: str = '', creation_time: Optional[datetime.datetime] = None, last_updated_time: Optional[datetime.datetime] = None, metadata: Optional[Dict[str, Any]] = None, master: str = '')¶ Class representing a model in the model registry. It contains methods for model versions and metadata.
- Parameters
name (string) – The name of the model.
description (string, optional) – The description of the model.
creation_time (datetime) – The time the model was created.
last_updated_time (datetime) – The time the model was most recently updated.
metadata (dict, optional) – User-defined metadata associated with the checkpoint.
master (string, optional) – The address of the Determined master instance.
-
get_version
(version: int = 0) → Optional[determined_common.experimental.checkpoint._checkpoint.Checkpoint]¶ Retrieve the checkpoint corresponding to the specified version of the model. If the specified version of the model does not exist, an exception is raised.
If no version is specified, the latest version of the model is returned. In this case, if there are no registered versions of the model,
None
is returned.- Parameters
version (int, optional) – The model version number requested.
-
get_versions
(order_by: determined_common.experimental.model.ModelOrderBy = <ModelOrderBy.DESCENDING: 2>) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]¶ Get a list of checkpoints corresponding to versions of this model. The models are sorted by version number and are returned in descending order by default.
- Parameters
order_by (enum) – A member of the
ModelOrderBy
enum.
-
register_version
(checkpoint_uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Creates a new model version and returns the
Checkpoint
corresponding to the version.- Parameters
checkpoint_uuid – The UUID of the checkpoint to register.
-
add_metadata
(metadata: Dict[str, Any]) → None¶ Adds user-defined metadata to the model. The
metadata
argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the model’s metadata, the previous dictionary entries are replaced.- Parameters
metadata (dict) – Dictionary of metadata to add to the model.
-
remove_metadata
(keys: List[str]) → None¶ Removes user-defined metadata from the model. Any top-level keys that appear in the
keys
list are removed from the model.- Parameters
keys (List[string]) – Top-level keys to remove from the model metadata.
ModelOrderBy
¶
ModelSortBy
¶
TrialReference
¶
-
class
determined.experimental.
TrialReference
(trial_id: int, master: str)¶ Trial reference class used for querying relevant
Checkpoint
instances.- Parameters
trial_id (int) – The trial ID.
master (string, optional) – The URL of the Determined master. If this class is obtained via
determined.experimental.Determined
, the master URL is automatically passed into this constructor.
-
top_checkpoint
(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
Checkpoint
instance with the best validation metric as defined by thesort_by
andsmaller_is_better
arguments.- Parameters
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
-
select_checkpoint
(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
Checkpoint
instance with the best validation metric as defined by thesort_by
andsmaller_is_better
arguments.Exactly one of the
best
,latest
, oruuid
parameters must be set.- Parameters
latest (bool, optional) – Return the most recent checkpoint.
best (bool, optional) – Return the checkpoint with the best validation metric as defined by the
sort_by
andsmaller_is_better
arguments. Ifsort_by
andsmaller_is_better
are not specified, the values from the associated experiment configuration will be used.uuid (string, optional) – Return the checkpoint for the specified UUID.
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.