Shortcuts

determined.experimental

determined.experimental.create(trial_def: Type[determined._trial.Trial], config: Optional[Dict[str, Any]] = None, local: bool = False, test: bool = False, context_dir: str = '', command: Optional[List[str]] = None, master_url: Optional[str] = None) → Any

Create an experiment.

Parameters
  • trial_def – A class definition implementing the det.Trial interface.

  • config – A dictionary representing the experiment configuration to be associated with the experiment.

  • local – A boolean indicating if training will happen locally. When False, the experiment will be submitted to the Determined cluster. Defaults to False.

  • test – A boolean indicating if the experiment should be shortened to a minimal loop of training, validation, and checkpointing. test=True is useful quick iterating during model porting or debugging because common errors will surface more quickly. Defaults to False.

  • context_dir

    A string filepath that defines the context directory. All model code will be executed with this as the current working directory.

    When local=False, this argument is required. All files in this directory will be uploaded to the Determined cluster. The total size of this directory must be under 96 MB.

    When local=True, this argument is optional and assumed to be the current working directory by default.

  • command

    A list of strings that is used as the entrypoint of the training script in the Determined task environment. When executing this function via a python script, this argument is inferred to be sys.argv by default. When executing this function via IPython or Jupyter notebook, this argument is required.

    Example: When creating an experiment by running “python train.py –flag value”, the default command is inferred as [“train.py”, “–flag”, “value”].

  • master_url – An optional string to use as the Determined master URL when local=False. If not specified, will be inferred from the environment variable DET_MASTER.

determined.experimental.create_trial_instance(trial_def: Type[determined._trial.Trial], checkpoint_dir: str, config: Optional[Dict[str, Any]] = None, hparams: Optional[Dict[str, Any]] = None) → determined._trial.Trial

Create a trial instance from a Trial class definition. This can be a useful utility for debugging your trial logic in any development environment.

Parameters
  • trial_def – A class definition that inherits from the det.Trial interface.

  • checkpoint_dir – The checkpoint directory that the trial will use for loading and saving checkpoints.

  • config – An optional experiment configuration that is used to initialize the determined.TrialContext. If not specified, a minimal default is used.

Determined

class determined.experimental.Determined(master: Optional[str] = None, user: Optional[str] = None)

Determined gives access to Determined API objects.

Parameters
  • master (string, optional) – The URL of the Determined master. If this argument is not specified environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.

  • user (string, optional) – The Determined username used for authentication. (default: determined)

get_experiment(experiment_id: int) → determined_common.experimental.experiment.ExperimentReference

Get the det.experimental.ExperimentReference representing the experiment with the provided experiment ID.

get_trial(trial_id: int) → determined_common.experimental.trial.TrialReference

Get the det.experimental.TrialReference representing the trial with the provided trial ID.

get_checkpoint(uuid: str) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Get the det.experimental.Checkpoint representing the checkpoint with the provided UUID.

ExperimentReference

class determined.experimental.ExperimentReference(experiment_id: int, master: str)

Experiment reference class used for querying relevant det.experimental.Checkpoint instances.

Parameters
  • experiment_id (int) – The experiment ID.

  • master (string, optional) – The URL of the Determined master. If this class is obtained via det.experimental.Determined the master URL is automatically passed into this constructor.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the det.experimental.Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters
  • sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.

top_n_checkpoints(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → List[determined_common.experimental.checkpoint._checkpoint.Checkpoint]

Return the N det.experimental.Checkpoint instances with the best validation metric values as defined by the sort_by and smaller_is_better arguments. This command will return the best checkpoint from the top N performing distinct trials of the experiment.

Parameters
  • sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the experiment configuration is used.

TrialReference

class determined.experimental.TrialReference(trial_id: int, master: str)

Trial reference class used for querying relevant det.experimental.Checkpoint instances.

Parameters
  • trial_id (int) – the trial ID.

  • master (string, optional) – The URL of the Determined master. If this class is obtained via det.experimental.Determined the master URL is automatically passed into this constructor.

top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the det.experimental.Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Parameters
  • sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.

select_checkpoint(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) → determined_common.experimental.checkpoint._checkpoint.Checkpoint

Return the det.experimental.Checkpoint instance with the best validation metric as defined by the sort_by and smaller_is_better arguments.

Exactly one of the best, latest, or uuid parameters must be set.

Parameters
  • latest (bool, optional) – return the most recent checkpoint.

  • best (bool, optional) – return the checkpoint with the best validation metric as defined by the sort_by and smaller_is_better arguments. If sort_by and smaller_is_better are not specified, the values from the associated experiment configuration will be used.

  • uuid (string, optional) – return the checkpoint for the specified uuid.

  • sort_by (string, optional) – the name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.

  • smaller_is_better (bool, optional) – specifies whether to sort the metric above in ascending or descending order. If sort_by is unset, this parameter is ignored. By default the smaller_is_better value in the related experiment configuration is used.

Checkpoint

class determined.experimental.Checkpoint(uuid: str, storage_config: Dict[str, Any], batch_number: int, start_time: str, end_time: str, resources: Dict[str, Any], validation: Dict[str, Any])

Class representing a checkpoint. Contains methods for downloading checkpoints to a local path and loading checkpoints into memory.

The det.experimental.Trial class contains methods that return instances of this class.

download(path: Optional[str] = None) → str

Download checkpoint from the checkpoint storage location locally.

Parameters

path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set the checkpoint will be downloaded to checkpoints/<checkpoint_uuid> relative to the current working directory.

load(path: Optional[str] = None, tags: Optional[List[str]] = None, **kwargs: Any) → Any

Loads a Determined checkpoint into memory. If the checkpoint is not present on disk it will be downloaded from persistent storage.

Parameters
  • path (string, optional) – Top level directory to load the checkpoint from. (default: checkpoint/<UUID>)

  • tags (list string, optional) – Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.

  • kwargs – Only relevant for PyTorch checkpoints. The keyword arguments will be applied to torch.load. See documentation for torch.load.

static load_from_path(path: str, tags: Optional[List[str]] = None) → Any

Loads a Determined checkpoint from a local file system path into memory. If the checkpoint is a pytorch model a torch.nn.Module is returned. If the checkpoint contains a tensorflow saved_model a tensorflow autotrackable object is returned.

Parameters
  • path (string) – Local path to the top level directory of a checkpoint.

  • tags (list string, optional) –

    Only relevant for tensorflow saved_model checkpoints. Specifies which tags are loaded from the tensoflow saved_model. See documentation for tf.compat.v1.saved_model.load_v2.