determined.experimental¶
-
determined.experimental.
create
(trial_def: Type[determined._trial.Trial], config: Optional[Dict[str, Any]] = None, local: bool = False, test: bool = False, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → Any¶ Create an experiment.
- Parameters
trial_def – A class definition implementing the
det.Trial
interface.config – A dictionary representing the experiment configuration to be associated with the experiment.
local – A boolean indicating if training will happen locally. When
False
, the experiment will be submitted to the Determined cluster. Defaults toFalse
.test – A boolean indicating if the experiment should be shortened to a minimal loop of training, validation, and checkpointing.
test=True
is useful quick iterating during model porting or debugging because common errors will surface more quickly. Defaults toFalse
.context_dir –
A string filepath that defines the context directory. All model code will be executed with this as the current working directory.
When
local=False
, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.When
local=True
, this argument is optional and assumed to be the current working directory by default.command –
A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a python script, this argument is inferred to be
sys.argv
by default. When executing this function via IPython or Jupyter notebook, this argument is required.Example: When creating an experiment by running “python train.py –flag value”, the default command is inferred as [“train.py”, “–flag”, “value”].
master_url – An optional string to use as the Determined master URL when
local=False
. If not specified, will be inferred from the environment variableDET_MASTER
.
-
determined.experimental.
create_trial_instance
(trial_def: Type[determined._trial.Trial], checkpoint_dir: str, config: Optional[Dict[str, Any]] = None, hparams: Optional[Dict[str, Any]] = None) → determined._trial.Trial¶ Create a trial instance from a Trial class definition. This can be a useful utility for debugging your trial logic in any development environment.
- Parameters
trial_def – A class definition that inherits from the det.Trial interface.
checkpoint_dir – The checkpoint directory that the trial will use for loading and saving checkpoints.
config – An optional experiment configuration that is used to initialize the
determined.TrialContext
. If not specified, a minimal default is used.
Determined
¶
-
class
determined.experimental.
Determined
(master: Optional[str] = None, user: Optional[str] = None)¶ Determined gives access to Determined API objects.
- Parameters
master (string, optional) – The URL of the Determined master. If this argument is not specified environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.
user (string, optional) – The Determined username used for authentication. (default:
determined
)
-
get_experiment
(experiment_id: int) → determined_common.experimental.experiment.ExperimentReference¶ Get the
det.experimental.ExperimentReference
representing the experiment with the provided experiment ID.
-
get_trial
(trial_id: int) → determined_common.experimental.trial.TrialReference¶ Get the
det.experimental.TrialReference
representing the trial with the provided trial ID.
-
get_checkpoint
(uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Get the
det.experimental.Checkpoint
representing the checkpoint with the provided UUID.
ExperimentReference
¶
-
class
determined.experimental.
ExperimentReference
(experiment_id: int, master: str)¶ Experiment reference class used for querying relevant
det.experimental.Checkpoint
instances.- Parameters
experiment_id (int) – The experiment ID.
master (string, optional) – The URL of the Determined master. If this class is obtained via
det.experimental.Determined
the master URL is automatically passed into this constructor.
-
top_checkpoint
(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
det.experimental.Checkpoint
instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.- Parameters
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.
-
top_n_checkpoints
(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]¶ Return the N
det.experimental.Checkpoint
instances with the best validation metric values as defined by the sort_by and smaller_is_better arguments. This command will return the best checkpoint from the top N performing distinct trials of the experiment.- Parameters
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.
TrialReference
¶
-
class
determined.experimental.
TrialReference
(trial_id: int, master: str)¶ Trial reference class used for querying relevant
det.experimental.Checkpoint
instances.- Parameters
trial_id (int) – the trial ID.
master (string, optional) – The URL of the Determined master. If this class is obtained via
det.experimental.Determined
the master URL is automatically passed into this constructor.
-
top_checkpoint
(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
det.experimental.Checkpoint
instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.- Parameters
sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.
-
select_checkpoint
(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint¶ Return the
det.experimental.Checkpoint
instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.Exactly one of the best, latest, or uuid parameters must be set.
- Parameters
latest (bool, optional) – return the most recent checkpoint.
best (bool, optional) – return the checkpoint with the best validation metric as defined by the sort_by and smaller_is_better arguments. If sort_by and smaller_is_better are not specified, the values from the associated experiment configuration will be used.
uuid (string, optional) – return the checkpoint for the specified uuid.
sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.
Checkpoint
¶
-
class
determined.experimental.
Checkpoint
(uuid: str, storage_config: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: Dict[str, Any])¶ Class representing a checkpoint. Contains methods for downloading checkpoints to a local path and loading checkpoints into memory.
The
det.experimental.Trial
class contains methods that return instances of this class.-
download
(path: Optional[str] = None) → str¶ Download checkpoint from the checkpoint storage location locally.
- Parameters
path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set the checkpoint will be downloaded to checkpoints/<checkpoint_uuid> relative to the current working directory.
-
load
(path: Optional[str] = None, tags: Optional[List[str]] = None, **kwargs: Any) → Any¶ Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.
- Parameters
path (string, optional) – Top level directory to load the checkpoint from. (default:
checkpoint/<UUID>
)tags (list string, optional) – Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.
kwargs – Only relevant for PyTorch checkpoints. The keyword arguments will be applied to torch.load. See documentation for torch.load.
-
static
load_from_path
(path: str, tags: Optional[List[str]] = None) → Any¶ Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a pytorch model a
torch.nn.Module
is returned. If the checkpoint contains a tensorflow saved_model a tensorflow autotrackable object is returned.- Parameters
path (string) – Local path to the top level directory of a checkpoint.
tags (list string, optional) –
Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.
-