Shortcuts

determined.experimental

determined.experimental.create(trial_def: Type[determined._trial.Trial], config: Optional[Dict[str, Any]] = None, mode: determined.experimental._native.Mode = <Mode.CLUSTER: 'cluster'>, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → None

Create an experiment.

Parameters
  • trial_def – A class definition implementing the det.Trial interface.

  • config – A dictionary representing the experiment configuration to be associated with the experiment.

  • mode

    The determined.experimental.Mode used when creating an experiment

    1. Mode.CLUSTER (default): Submit the experiment to a remote Determined cluster.

    2. Mode.LOCAL: Test the experiment in the calling Python process for local development / debugging purposes. Run through a minimal loop of training, validation, and checkpointing steps.

  • context_dir

    A string filepath that defines the context directory. All model code will be executed with this as the current working directory.

    In CLUSTER mode, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.

    In LOCAL mode, this argument is optional and assumed to be the current working directory by default.

  • command

    A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a python script, this argument is inferred to be sys.argv by default. When executing this function via IPython or Jupyter notebook, this argument is required.

    Example: When creating an experiment by running “python train.py –flag value”, the default command is inferred as [“train.py”, “–flag”, “value”].

  • master_url – An optional string to use as the Determined master URL in submit mode. If not specified, will be inferred from the environment variable DET_MASTER.

determined.experimental.create_trial_instance(trial_def: Type[determined._trial.Trial], checkpoint_dir: str, config: Optional[Dict[str, Any]] = None) → determined._trial.Trial

Create a trial instance from a Trial class definition. This can be a useful utility for debugging your trial logic in any development environment.

Parameters
  • trial_def – A class definition that inherits from the det.Trial interface.

  • checkpoint_dir – The checkpoint directory that the trial will use for loading and saving checkpoints.

  • config – An optional experiment configuration that is used to initialize the determined.TrialContext. If not specified, a minimal default is used.

class determined.experimental.Mode

The mode used to create an experiment.

See determined.create().

Determined

class determined.experimental.Determined(master: Optional[str] = None, user: Optional[str] = None)

Determined gives access to Determined API objects.

Parameters
  • master (string, optional) – The URL of the Determined master. If this argument is not specified environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.

  • user (string, optional) – The Determined username used for authentication. (default: determined)

get_experiment(experiment_id: int) → determined_common.experimental.experiment.ExperimentReference

Get the det.experimental.ExperimentReference representing the experiment with the provided experiment ID.

get_trial(trial_id: int) → determined_common.experimental.trial.TrialReference

Get the det.experimental.TrialReference representing the trial with the provided trial ID.

get_checkpoint(uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Get the det.experimental.Checkpoint representing the checkpoint with the provided UUID.

ExperimentReference

class determined.experimental.ExperimentReference(experiment_id: int, master: str)

Experiment reference class used for querying relevant det.experimental.Checkpoint instances.

Parameters
  • experiment_id (int) – The experiment ID.

  • master (string, optional) – The URL of the Determined master. If this class is obtained via det.experimental.Determined the master URL is automatically passed into this constructor.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the det.experimental.Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters
  • sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.

top_n_checkpoints(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]

Return the N det.experimental.Checkpoint instances with the best validation metric values as defined by the sort_by and smaller_is_better arguments. This command will return the best checkpoint from the top N performing distinct trials of the experiment.

Parameters
  • sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.

TrialReference

class determined.experimental.TrialReference(trial_id: int, master: str)

Trial reference class used for querying relevant det.experimental.Checkpoint instances.

Parameters
  • trial_id (int) – the trial ID.

  • master (string, optional) – The URL of the Determined master. If this class is obtained via det.experimental.Determined the master URL is automatically passed into this constructor.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the det.experimental.Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters
  • sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.

select_checkpoint(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the det.experimental.Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Exactly one of the best, latest, or uuid parameters must be set.

Parameters
  • latest (bool, optional) – return the most recent checkpoint.

  • best (bool, optional) – return the checkpoint with the best validation metric as defined by the sort_by and smaller_is_better arguments. If sort_by and smaller_is_better are not specified, the values from the associated experiment configuration will be used.

  • uuid (string, optional) – return the checkpoint for the specified uuid.

  • sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.

Checkpoint

class determined.experimental.Checkpoint(uuid: str, storage_config: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: Dict[str, Any])

Class representing a checkpoint. Contains methods for downloading checkpoints to a local path and loading checkpoints into memory.

The det.experimental.Trial class contains methods that return instances of this class.

download(path: Optional[str] = None) → str

Download checkpoint from the checkpoint storage location locally.

Parameters

path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set the checkpoint will be downloaded to checkpoints/<checkpoint_uuid> relative to the current working directory.

load(path: Optional[str] = None, tags: Optional[List[str]] = None, **kwargs: Any) → Any

Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.

Parameters
  • path (string, optional) – Top level directory to load the checkpoint from. (default: checkpoint/<UUID>)

  • tags (list string, optional) – Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.

  • kwargs – Only relevant for PyTorch checkpoints. The keyword arguments will be applied to torch.load. See documentation for torch.load.

static load_from_path(path: str, tags: Optional[List[str]] = None, **kwargs: Any) → Any

Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a pytorch model a torch.nn.Module is returned. If the checkpoint contains a tensorflow saved_model a tensorflow autotrackable object is returned.

Parameters
  • path (string) – Local path to the top level directory of a checkpoint.

  • tags (list string, optional) –

    Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.