determined.experimental¶
-
determined.experimental.
create
(trial_def: Type[determined._trial.Trial], config: Optional[Dict[str, Any]] = None, mode: determined.experimental._native.Mode = <Mode.CLUSTER: 'cluster'>, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → None¶ Create an experiment.
- Parameters
trial_def – A class definition implementing the
det.Trial
interface.config – A dictionary representing the experiment configuration to be associated with the experiment.
mode –
The
determined.experimental.Mode
used when creating an experiment1.
Mode.CLUSTER
(default): Submit the experiment to a remote Determined cluster.2.
Mode.LOCAL
: Test the experiment in the calling Python process for local development / debugging purposes. Run through a minimal loop of training, validation, and checkpointing steps.context_dir –
A string filepath that defines the context directory. All model code will be executed with this as the current working directory.
In CLUSTER mode, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.
In LOCAL mode, this argument is optional and assumed to be the current working directory by default.
command –
A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a python script, this argument is inferred to be
sys.argv
by default. When executing this function via IPython or Jupyter notebook, this argument is required.Example: When creating an experiment by running “python train.py –flag value”, the default command is inferred as [“train.py”, “–flag”, “value”].
master_url – An optional string to use as the Determined master URL in submit mode. If not specified, will be inferred from the environment variable
DET_MASTER
.
-
determined.experimental.
create_trial_instance
(trial_def: Type[determined._trial.Trial], checkpoint_dir: str, config: Optional[Dict[str, Any]] = None) → determined._trial.Trial¶ Create a trial instance from a Trial class definition. This can be a useful utility for debugging your trial logic in any development environment.
- Parameters
trial_def – A class definition that inherits from the det.Trial interface.
checkpoint_dir – The checkpoint directory that the trial will use for loading and saving checkpoints.
config – An optional experiment configuration that is used to initialize the
determined.TrialContext
. If not specified, a minimal default is used.
-
class
determined.experimental.
Mode
¶ The mode used to create an experiment.
See
determined.create()
.
Determined
¶
-
class
determined.experimental.
Determined
(master: Optional[str] = None, user: Optional[str] = None)¶ Determined gives access to Determined API objects.
- Parameters
master (string, optional) – The URL of the Determined master. If this argument is not specified environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.
user (string, optional) – The Determined username used for authentication. (default:
determined
)
-
get_experiment
(experiment_id: int) → determined_common.experimental.experiment.ExperimentReference¶ Get the
det.experimental.ExperimentReference
representing the experiment with the provided experiment ID.
-
get_trial
(trial_id: int) → determined_common.experimental.trial.TrialReference¶ Get the
det.experimental.TrialReference
representing the trial with the provided trial ID.
-
get_checkpoint
(uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Get the
det.experimental.Checkpoint
representing the checkpoint with the provided UUID.
ExperimentReference
¶
-
class
determined.experimental.
ExperimentReference
(experiment_id: int, master: str)¶ Experiment reference class used for querying relevant
det.experimental.Checkpoint
instances.- Parameters
experiment_id (int) – The experiment ID.
master (string, optional) – The URL of the Determined master. If this class is obtained via
det.experimental.Determined
the master URL is automatically passed into this constructor.
-
top_checkpoint
(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
det.experimental.Checkpoint
instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.- Parameters
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.
-
top_n_checkpoints
(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]¶ Return the N
det.experimental.Checkpoint
instances with the best validation metric values as defined by the sort_by and smaller_is_better arguments. This command will return the best checkpoint from the top N performing distinct trials of the experiment.- Parameters
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.
TrialReference
¶
-
class
determined.experimental.
TrialReference
(trial_id: int, master: str)¶ Trial reference class used for querying relevant
det.experimental.Checkpoint
instances.- Parameters
trial_id (int) – the trial ID.
master (string, optional) – The URL of the Determined master. If this class is obtained via
det.experimental.Determined
the master URL is automatically passed into this constructor.
-
top_checkpoint
(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
det.experimental.Checkpoint
instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.- Parameters
sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.
-
select_checkpoint
(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
det.experimental.Checkpoint
instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.Exactly one of the best, latest, or uuid parameters must be set.
- Parameters
latest (bool, optional) – return the most recent checkpoint.
best (bool, optional) – return the checkpoint with the best validation metric as defined by the sort_by and smaller_is_better arguments. If sort_by and smaller_is_better are not specified, the values from the associated experiment configuration will be used.
uuid (string, optional) – return the checkpoint for the specified uuid.
sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.
Checkpoint
¶
-
class
determined.experimental.
Checkpoint
(uuid: str, storage_config: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: Dict[str, Any])¶ Class representing a checkpoint. Contains methods for downloading checkpoints to a local path and loading checkpoints into memory.
The
det.experimental.Trial
class contains methods that return instances of this class.-
download
(path: Optional[str] = None) → str¶ Download checkpoint from the checkpoint storage location locally.
- Parameters
path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set the checkpoint will be downloaded to checkpoints/<checkpoint_uuid> relative to the current working directory.
-
load
(path: Optional[str] = None, tags: Optional[List[str]] = None, **kwargs: Any) → Any¶ Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.
- Parameters
path (string, optional) – Top level directory to load the checkpoint from. (default:
checkpoint/<UUID>
)tags (list string, optional) – Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.
kwargs – Only relevant for PyTorch checkpoints. The keyword arguments will be applied to torch.load. See documentation for torch.load.
-
static
load_from_path
(path: str, tags: Optional[List[str]] = None, **kwargs: Any) → Any¶ Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a pytorch model a
torch.nn.Module
is returned. If the checkpoint contains a tensorflow saved_model a tensorflow autotrackable object is returned.- Parameters
path (string) – Local path to the top level directory of a checkpoint.
tags (list string, optional) –
Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.
-