Transformers API#
model_hub.huggingface
#
- class model_hub.huggingface.BaseTransformerTrial(context: determined.pytorch._pytorch_context.PyTorchTrialContext)#
This is the base PyTorchTrial for transformers that implements the
__init__
andtrain_batch
methods.You can subclass
BaseTransformerTrial
to customize a trial for your own usage by filing in the expected methods for data loading and evaluation.
The __init__
method replicated below makes heavy use of the helper functions in the next section.
def __init__(self, context: det_torch.PyTorchTrialContext) -> None:
self.context = context
# A subclass of BaseTransformerTrial may have already set hparams and data_config
# attributes so we only reset them if they do not exist.
if not hasattr(self, "hparams"):
self.hparams = attrdict.AttrDict(context.get_hparams())
if not hasattr(self, "data_config"):
self.data_config = attrdict.AttrDict(context.get_data_config())
if not hasattr(self, "exp_config"):
self.exp_config = attrdict.AttrDict(context.get_experiment_config())
# Check to make sure all expected hyperparameters are set.
self.check_hparams()
# Parse hparams and data_config.
(
self.config_kwargs,
self.tokenizer_kwargs,
self.model_kwargs,
) = hf_parse.default_parse_config_tokenizer_model_kwargs(self.hparams)
optimizer_kwargs, scheduler_kwargs = hf_parse.default_parse_optimizer_lr_scheduler_kwargs(
self.hparams
)
self.config, self.tokenizer, self.model = build_using_auto(
self.config_kwargs,
self.tokenizer_kwargs,
self.hparams.model_mode,
self.model_kwargs,
use_pretrained_weights=self.hparams.use_pretrained_weights,
)
self.model = self.context.wrap_model(self.model)
self.optimizer = self.context.wrap_optimizer(
build_default_optimizer(self.model, optimizer_kwargs)
)
if self.hparams.use_apex_amp:
self.model, self.optimizer = self.context.configure_apex_amp(
models=self.model,
optimizers=self.optimizer,
)
self.lr_scheduler = self.context.wrap_lr_scheduler(
build_default_lr_scheduler(self.optimizer, scheduler_kwargs),
det_torch.LRScheduler.StepMode.STEP_EVERY_BATCH,
)
self.grad_clip_fn = None
if optimizer_kwargs.max_grad_norm > 0: # type: ignore
self.grad_clip_fn = lambda x: torch.nn.utils.clip_grad_norm_(
x, optimizer_kwargs.max_grad_norm
)
The evaluate_batch
method replicated below should work for most models and tasks but can be
overwritten for more custom behavior in a subclass.
def train_batch(self, batch: Any, epoch_idx: int, batch_idx: int) -> Any:
# By default, all HF models return the loss in the first element.
# We do not automatically apply a label smoother for the user.
# If this is something you want to use, please see how it's
# applied by transformers.Trainer:
# https://github.com/huggingface/transformers/blob/v4.3.3/src/transformers/trainer.py#L1324
outputs = self.model(**batch)
loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]
self.context.backward(loss)
self.context.step_optimizer(self.optimizer, self.grad_clip_fn)
return loss
Helper Functions#
The BaseTransformerTrial
calls many helper functions below that are also useful when subclassing
BaseTransformerTrial
or writing custom transformers trials for use with Determined.
- model_hub.huggingface.build_default_lr_scheduler(optimizer: torch.optim.optimizer.Optimizer, scheduler_kwargs: model_hub.huggingface._config_parser.LRSchedulerKwargs) Any #
This follows the function in transformer’s Trainer to construct the lr_scheduler.
- Parameters
optimizer – optimizer to apply lr_scheduler to
scheduler_kwargs – see LRSchedulerKwargs in _config_parser.py for expected fields.
- Returns
lr_scheduler configured accordingly
- model_hub.huggingface.build_default_optimizer(model: torch.nn.modules.module.Module, optimizer_kwargs: model_hub.huggingface._config_parser.OptimizerKwargs) Union[transformers.optimization.Adafactor, transformers.optimization.AdamW] #
This follows the function in transformer’s Trainer to construct the optimizer.
- Parameters
model – model whose parameters will be updated by the optimizer
weight_decay – weight_decay factor to apply to weights
optimizer_kwargs – see OptimizerKwargs in _config_parser.py for expected fields
- Returns
optimizer configured accordingly
- model_hub.huggingface.build_using_auto(config_kwargs: Union[Dict, attrdict.dictionary.AttrDict], tokenizer_kwargs: Union[Dict, attrdict.dictionary.AttrDict], model_mode: str, model_kwargs: Union[Dict, attrdict.dictionary.AttrDict], use_pretrained_weights: bool = True) Tuple[transformers.PretrainedConfig, transformers.PreTrainedTokenizer, transformers.PreTrainedModel] #
Build the config, tokenizer, and model using tranformer’s Auto classes.
- Parameters
config_kwargs – arguments for transformers configuration classes
tokenizer_kwargs – arguments for transformers tokenizer classes
model_mode – one of (pretraining, causal-lm, masked-lm, seq2seq-lm, sequence-classification, multiple-choice, next-sentence, token-classification, question-answering)
model_kwargs – arguments for transformers model classes
- Returns
transformer config, tokenizer, and model
- model_hub.huggingface.default_load_dataset(data_config: Union[Dict, attrdict.dictionary.AttrDict]) Union[datasets.Dataset, datasets.IterableDataset, datasets.DatasetDict, datasets.IterableDatasetDict] #
Creates the dataset using HuggingFace datasets’ load_dataset method. If a dataset_name is provided, we will use that long with the dataset_config_name. Otherwise, we will create the dataset using provided train_file and validation_file.
- Parameters
data_config – arguments for load_dataset. See DatasetKwargs for expected fields.
- Returns
Dataset returned from hf_datasets.load_dataset.
- model_hub.huggingface.default_parse_config_tokenizer_model_kwargs(hparams: Union[Dict, attrdict.dictionary.AttrDict]) Tuple[Dict, Dict, Dict] #
This function will provided hparams into fields for the transformers config, tokenizer, and model. See the defined dataclasses ConfigKwargs, TokenizerKwargs, and ModelKwargs for expected fields and defaults.
- Parameters
hparams – hyperparameters to parse.
- Returns
One dictionary each for the config, tokenizer, and model.
- model_hub.huggingface.default_parse_optimizer_lr_scheduler_kwargs(hparams: Union[Dict, attrdict.dictionary.AttrDict]) Tuple[model_hub.huggingface._config_parser.OptimizerKwargs, model_hub.huggingface._config_parser.LRSchedulerKwargs] #
Parse hparams relevant for the optimizer and lr_scheduler and fills in with the same defaults as those used by the transformers Trainer. See the defined dataclasses OptimizerKwargs and LRSchedulerKwargs for expected fields and defaults.
- Parameters
hparams – hparams to parse.
- Returns
Configuration for the optimizer and lr scheduler.
Structured Dataclasses#
Structured dataclasses are used to ensure that Determined parses the experiment config correctly. See the below classes for details on what fields can be used in the experiment config to configure the dataset; transformers config, model, and tokenizer; as well as optimizer and learning rate scheduler for use with the functions above.
- class model_hub.huggingface.DatasetKwargs(**kwargs: Dict[str, Any])#
Config parser for dataset fields.
Either
dataset_name
needs to be provided ortrain_file
andvalidation_file
need to be provided.- Parameters
dataset_name (optional, defaults to
None
) – Path argument to pass to HuggingFacedatasets.load_dataset
. Can be a dataset identifier in HuggingFace Datasets Hub or a local path to processing script.dataset_config_name (optional, defaults to
None
) – The name of the dataset configuration to pass to HuggingFacedatasets.load_dataset
.validation_split_percentage (optional, defaults to
None
) – This is used to create a validation split from the training data when a dataset does not have a predefined validation split.train_file (optional, defaults to
None
) – Path to training data. This will be used if a dataset_name is not provided.validation_file (optional, defaults to
None
) – Path to validation data. This will be used if a dataset_name is not provided.
- Returns
dataclass with the above fields populated according to provided config.
- class model_hub.huggingface.ConfigKwargs(**kwargs: Dict[str, Any])#
Config parser for transformers config fields.
- Parameters
pretrained_model_name_or_path – Path to pretrained model or model identifier from huggingface.co/models.
cache_dir (optional, defaults to
None
) – Where do you want to store the pretrained models downloaded from huggingface.co.revision (optional, defaults to
None
) – The specific model version to use (can be a branch name, tag name or commit id).use_auth_token (optional, defaults to
None
) – Will use the token generated when runningtransformers-cli login
(necessary to use this script with private models).num_labels (optional, excluded if not provided) – Number of labels to use in the last layer added to the model, typically for a classification task.
finetuning_task (optional, excluded if not provided) – Name of the task used to fine-tune the model. This can be used when converting from an original PyTorch checkpoint.
- Returns
dataclass with the above fields populated according to provided config.
- class model_hub.huggingface.ModelKwargs(pretrained_model_name_or_path: str, cache_dir: Optional[str] = None, revision: Optional[str] = 'main', use_auth_token: Optional[bool] = False)#
Config parser for transformers model fields.
- Parameters
pretrained_model_name_or_path – Path to pretrained model or model identifier from huggingface.co/models.
cache_dir (optional, defaults to
None
) – Where do you want to store the pretrained models downloaded from huggingface.co.revision (optional, defaults to
None
) – The specific model version to use (can be a branch name, tag name or commit id).use_auth_token (optional, defaults to
None
) – Will use the token generated when runningtransformers-cli login
(necessary to use this script with private models).
- Returns
dataclass with the above fields populated according to provided config.
- class model_hub.huggingface.OptimizerKwargs(weight_decay: Optional[float] = 0, adafactor: Optional[bool] = False, learning_rate: Optional[float] = 5e-05, max_grad_norm: Optional[float] = 1.0, adam_beta1: Optional[float] = 0.9, adam_beta2: Optional[float] = 0.999, adam_epsilon: Optional[float] = 1e-08, scale_parameter: Optional[bool] = False, relative_step: Optional[bool] = False)#
Config parser for transformers optimizer fields.
- class model_hub.huggingface.LRSchedulerKwargs(num_training_steps: int, lr_scheduler_type: Optional[str] = 'linear', num_warmup_steps: Optional[int] = 0)#
Config parser for transformers lr scheduler fields.