determined.pytorch¶
determined.pytorch.PyTorchTrial
¶
-
class
determined.pytorch.
PyTorchTrial
(trial_context: determined._train_context.TrialContext)¶ PyTorch trials are created by subclassing the abstract class
PyTorchTrial
. Users must define all abstract methods to create the deep learning model associated with a specific trial, and to subsequently train and evaluate it.-
abstract
build_model
() → torch.nn.modules.module.Module¶ Defines the deep learning architecture associated with a trial, which typically depends on the trial’s specific hyperparameter settings stored in the
hparams
dictionary. This method returns the model as an an instance or subclass ofnn.Module
.
-
abstract
optimizer
(model: torch.nn.modules.module.Module) → torch.optim.optimizer.Optimizer¶ Describes the optimizer to be used during training of the given model, an instance of
torch.optim.Optimizer
.
-
abstract
train_batch
(batch: Union[Dict[str, torch.Tensor], Sequence[torch.Tensor], torch.Tensor], model: torch.nn.modules.module.Module, epoch_idx: int, batch_idx: int) → Union[torch.Tensor, Dict[str, Any]]¶ Calculate the loss for a batch and return it in a dictionary.
batch_idx
represents the total number of batches processed per device (slot) since the start of training.
-
abstract
build_training_data_loader
() → determined.pytorch._data.DataLoader¶ Defines the data loader to use during training.
Must return an instance of
determined.pytorch.DataLoader
.
-
abstract
build_validation_data_loader
() → determined.pytorch._data.DataLoader¶ Defines the data loader to use during validation.
Must return an instance of
determined.pytorch.DataLoader
.
-
evaluate_batch
(batch: Union[Dict[str, torch.Tensor], Sequence[torch.Tensor], torch.Tensor], model: torch.nn.modules.module.Module) → Dict[str, Any]¶ Calculate evaluation metrics for a batch and return them as a dictionary mapping metric names to metric values.
There are two ways to specify evaluation metrics. Either override
evaluate_batch()
orevaluate_full_dataset()
. Whileevaluate_full_dataset()
is more flexible,evaluate_batch()
should be preferred, since it can be parallelized in distributed environments, whereasevaluate_full_dataset()
cannot. Only one ofevaluate_full_dataset()
andevaluate_batch()
should be overridden by a trial.
-
evaluation_reducer
() → Union[determined.pytorch._reducer.Reducer, Dict[str, determined.pytorch._reducer.Reducer]]¶ Return a reducer for all evaluation metrics, or a dict mapping metric names to individual reducers. Defaults to
det.pytorch.Reducer.AVG
.
-
evaluate_full_dataset
(data_loader: torch.utils.data.dataloader.DataLoader, model: torch.nn.modules.module.Module) → Dict[str, Any]¶ Calculate validation metrics on the entire validation dataset and return them as a dictionary mapping metric names to reduced metric values (i.e., each returned metric is the average or sum of that metric across the entire validation set).
This validation can not be distributed and is performed on a single device, even when multiple devices (slots) are used for training. Only one of
evaluate_full_dataset()
andevaluate_batch()
should be overridden by a trial.
-
create_lr_scheduler
(optimizer: torch.optim.optimizer.Optimizer) → Optional[determined.pytorch._lr_scheduler.LRScheduler]¶ Create a learning rate scheduler for the trial given an instance of the optimizer.
- Parameters
optimizer (torch.optim.Optimizer) – instance of the optimizer to be used for training
- Returns
Wrapper around a
torch.optim.lr_scheduler._LRScheduler
.- Return type
det.pytorch.LRScheduler
-
abstract
__init__
(trial_context: determined._train_context.TrialContext) → None¶ Initializes a trial using the provided trial_context.
Override this function to initialize any shared state between the function implementations.
-
trial_context_class
¶ alias of
determined._train_context.TrialContext
-
abstract
-
class
determined.pytorch.
LRScheduler
(scheduler: torch.optim.lr_scheduler._LRScheduler, step_mode: determined.pytorch._lr_scheduler.LRScheduler.StepMode)¶ -
class
StepMode
¶ Specifies when and how scheduler.step() should be executed.
-
STEP_EVERY_EPOCH
¶
-
STEP_EVERY_BATCH
¶
-
MANUAL_STEP
¶
-
-
__init__
(scheduler: torch.optim.lr_scheduler._LRScheduler, step_mode: determined.pytorch._lr_scheduler.LRScheduler.StepMode)¶ Wrapper for a PyTorch LRScheduler.
Usage of this wrapper is required to properly scheduler the optimizer’s learning rate.
- This wrapper fulfills two main functions:
Save and restore of the learning rate in case a trial is paused, preempted, etc.
Step the learning rate scheduler for predefined frequencies (every batch or every epoch).
- Parameters
scheduler (
torch.optim.lr_scheduler._LRScheduler
) – Learning rate scheduler to be used by Determined.step_mode (
det.pytorch.LRSchedulerStepMode
) –The strategy Determined will use to call (or not call) scheduler.step().
STEP_EVERY_EPOCH: Determined will call scheduler.step() after every training epoch. No arguments will be passed to step().
STEP_EVERY_BATCH: Determined will call scheduler.step() after every training batch. No arguments will be passed to step().
MANUAL_STEP: Determined will not call scheduler.step() at all. It is up to the user to decide when to call scheduler.step(), and whether to pass any arguments.
-
get_last_lr
() → List¶ Return last computed learning rate by current scheduler.
This function is equivalent to calling get_last_lr() on the wrapped LRScheduler.
-
step
(*args: Any, **kwargs: Any) → None¶ Call step() on the wrapped LRScheduler instance.
-
class
-
class
determined.pytorch.
Reducer
¶ The available methods for reducing metrics available to users.
-
AVG
¶
-
SUM
¶
-
MAX
¶
-
MIN
¶
-
Data Loading¶
Loading data into PyTorchTrial
models is done by defining two functions,
build_training_data_loader()
and build_validation_data_loader()
.
These functions should each return an instance of
determined.pytorch.DataLoader
. determined.pytorch.DataLoader
behaves
the same as torch.utils.data.DataLoader
and is a drop-in replacement.
Each DataLoader
is allowed to return batches with arbitrary
structures of the following types, which will be fed directly to the
train_batch
and evaluate_batch
functions:
np.ndarray
np.array([[0, 0], [0, 0]])
torch.Tensor
torch.Tensor([[0, 0], [0, 0]])
tuple of
np.ndarray
s ortorch.Tensor
s(torch.Tensor([0, 0]), torch.Tensor([[0, 0], [0, 0]]))
list of
np.ndarray
s ortorch.Tensor
s[torch.Tensor([0, 0]), torch.Tensor([[0, 0], [0, 0]])]
dictionary mapping strings to
np.ndarray
s ortorch.Tensor
s{"data": torch.Tensor([[0, 0], [0, 0]]), "label": torch.Tensor([[1, 1], [1, 1]])}
combination of the above
{ "data": [ {"sub_data1": torch.Tensor([[0, 0], [0, 0]])}, {"sub_data2": torch.Tensor([0, 0])}, ], "label": (torch.Tensor([0, 0]), torch.Tensor([[0, 0], [0, 0]])), }
Examples¶
cifar10_cnn_pytorch (PyTorch
Sequential
model)mnist_pytorch (two examples: PyTorch
Sequential
model and true multi-input multi-output model)