Transformers API#

`model_hub.huggingface`#

class model_hub.huggingface.BaseTransformerTrial(context: determined.pytorch._pytorch_context.PyTorchTrialContext)#

This is the base PyTorchTrial for transformers that implements the __init__ and train_batch methods.

You can subclass BaseTransformerTrial to customize a trial for your own usage by filing in the expected methods for data loading and evaluation.

The __init__ method replicated below makes heavy use of the helper functions in the next section.

    def __init__(self, context: det_torch.PyTorchTrialContext) -> None:
        self.context = context
        # A subclass of BaseTransformerTrial may have already set hparams and data_config
        # attributes so we only reset them if they do not exist.
        if not hasattr(self, "hparams"):
            self.hparams = attrdict.AttrDict(context.get_hparams())
        if not hasattr(self, "data_config"):
            self.data_config = attrdict.AttrDict(context.get_data_config())
        if not hasattr(self, "exp_config"):
            self.exp_config = attrdict.AttrDict(context.get_experiment_config())
        # Check to make sure all expected hyperparameters are set.
        self.check_hparams()

        # Parse hparams and data_config.
        (
            self.config_kwargs,
            self.tokenizer_kwargs,
            self.model_kwargs,
        ) = hf_parse.default_parse_config_tokenizer_model_kwargs(self.hparams)
        optimizer_kwargs, scheduler_kwargs = hf_parse.default_parse_optimizer_lr_scheduler_kwargs(
            self.hparams
        )

        self.config, self.tokenizer, self.model = build_using_auto(
            self.config_kwargs,
            self.tokenizer_kwargs,
            self.hparams.model_mode,
            self.model_kwargs,
            use_pretrained_weights=self.hparams.use_pretrained_weights,
        )
        self.model = self.context.wrap_model(self.model)

        self.optimizer = self.context.wrap_optimizer(
            build_default_optimizer(self.model, optimizer_kwargs)
        )

        if self.hparams.use_apex_amp:
            self.model, self.optimizer = self.context.configure_apex_amp(
                models=self.model,
                optimizers=self.optimizer,
            )

        self.lr_scheduler = self.context.wrap_lr_scheduler(
            build_default_lr_scheduler(self.optimizer, scheduler_kwargs),
            det_torch.LRScheduler.StepMode.STEP_EVERY_BATCH,
        )

        self.grad_clip_fn = None

        if optimizer_kwargs.max_grad_norm > 0:  # type: ignore
            self.grad_clip_fn = lambda x: torch.nn.utils.clip_grad_norm_(
                x, optimizer_kwargs.max_grad_norm
            )

The evaluate_batch method replicated below should work for most models and tasks but can be overwritten for more custom behavior in a subclass.

    def train_batch(self, batch: Any, epoch_idx: int, batch_idx: int) -> Any:
        # By default, all HF models return the loss in the first element.
        # We do not automatically apply a label smoother for the user.
        # If this is something you want to use, please see how it's
        # applied by transformers.Trainer:
        # https://github.com/huggingface/transformers/blob/v4.3.3/src/transformers/trainer.py#L1324
        outputs = self.model(**batch)
        loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]
        self.context.backward(loss)
        self.context.step_optimizer(self.optimizer, self.grad_clip_fn)
        return loss

Helper Functions#

The BaseTransformerTrial calls many helper functions below that are also useful when subclassing BaseTransformerTrial or writing custom transformers trials for use with Determined.

model_hub.huggingface.build_default_lr_scheduler(optimizer: torch.optim.optimizer.Optimizer, scheduler_kwargs: model_hub.huggingface._config_parser.LRSchedulerKwargs) → Any#

This follows the function in transformer’s Trainer to construct the lr_scheduler.

Parameters

optimizer – optimizer to apply lr_scheduler to
scheduler_kwargs – see LRSchedulerKwargs in _config_parser.py for expected fields.

Returns

lr_scheduler configured accordingly

model_hub.huggingface.build_default_optimizer(model: torch.nn.modules.module.Module, optimizer_kwargs: model_hub.huggingface._config_parser.OptimizerKwargs) → Union[transformers.optimization.Adafactor, transformers.optimization.AdamW]#

This follows the function in transformer’s Trainer to construct the optimizer.

Parameters

model – model whose parameters will be updated by the optimizer
weight_decay – weight_decay factor to apply to weights
optimizer_kwargs – see OptimizerKwargs in _config_parser.py for expected fields

Returns

optimizer configured accordingly

model_hub.huggingface.build_using_auto(config_kwargs: Union[Dict, attrdict.dictionary.AttrDict], tokenizer_kwargs: Union[Dict, attrdict.dictionary.AttrDict], model_mode: str, model_kwargs: Union[Dict, attrdict.dictionary.AttrDict], use_pretrained_weights: bool = True) → Tuple[transformers.PretrainedConfig, transformers.PreTrainedTokenizer, transformers.PreTrainedModel]#

Build the config, tokenizer, and model using tranformer’s Auto classes.

Parameters

config_kwargs – arguments for transformers configuration classes
tokenizer_kwargs – arguments for transformers tokenizer classes
model_mode – one of (pretraining, causal-lm, masked-lm, seq2seq-lm, sequence-classification, multiple-choice, next-sentence, token-classification, question-answering)
model_kwargs – arguments for transformers model classes

Returns

transformer config, tokenizer, and model

model_hub.huggingface.default_load_dataset(data_config: Union[Dict, attrdict.dictionary.AttrDict]) → Union[datasets.Dataset, datasets.IterableDataset, datasets.DatasetDict, datasets.IterableDatasetDict]#

Creates the dataset using HuggingFace datasets’ load_dataset method. If a dataset_name is provided, we will use that long with the dataset_config_name. Otherwise, we will create the dataset using provided train_file and validation_file.

Parameters: data_config – arguments for load_dataset. See DatasetKwargs for expected fields.
Returns: Dataset returned from hf_datasets.load_dataset.

model_hub.huggingface.default_parse_config_tokenizer_model_kwargs(hparams: Union[Dict, attrdict.dictionary.AttrDict]) → Tuple[Dict, Dict, Dict]#

This function will provided hparams into fields for the transformers config, tokenizer, and model. See the defined dataclasses ConfigKwargs, TokenizerKwargs, and ModelKwargs for expected fields and defaults.

Parameters: hparams – hyperparameters to parse.
Returns: One dictionary each for the config, tokenizer, and model.

model_hub.huggingface.default_parse_optimizer_lr_scheduler_kwargs(hparams: Union[Dict, attrdict.dictionary.AttrDict]) → Tuple[model_hub.huggingface._config_parser.OptimizerKwargs, model_hub.huggingface._config_parser.LRSchedulerKwargs]#

Parse hparams relevant for the optimizer and lr_scheduler and fills in with the same defaults as those used by the transformers Trainer. See the defined dataclasses OptimizerKwargs and LRSchedulerKwargs for expected fields and defaults.

Parameters: hparams – hparams to parse.
Returns: Configuration for the optimizer and lr scheduler.

Structured Dataclasses#

Structured dataclasses are used to ensure that Determined parses the experiment config correctly. See the below classes for details on what fields can be used in the experiment config to configure the dataset; transformers config, model, and tokenizer; as well as optimizer and learning rate scheduler for use with the functions above.

class model_hub.huggingface.DatasetKwargs(**kwargs: Dict[str, Any])#

Config parser for dataset fields.

Either dataset_name needs to be provided or train_file and validation_file need to be provided.

Parameters

dataset_name (optional, defaults to None) – Path argument to pass to HuggingFace datasets.load_dataset. Can be a dataset identifier in HuggingFace Datasets Hub or a local path to processing script.
dataset_config_name (optional, defaults to None) – The name of the dataset configuration to pass to HuggingFace datasets.load_dataset.
validation_split_percentage (optional, defaults to None) – This is used to create a validation split from the training data when a dataset does not have a predefined validation split.
train_file (optional, defaults to None) – Path to training data. This will be used if a dataset_name is not provided.
validation_file (optional, defaults to None) – Path to validation data. This will be used if a dataset_name is not provided.

Returns

dataclass with the above fields populated according to provided config.

class model_hub.huggingface.ConfigKwargs(**kwargs: Dict[str, Any])#

Config parser for transformers config fields.

Parameters

pretrained_model_name_or_path – Path to pretrained model or model identifier from huggingface.co/models.
cache_dir (optional, defaults to None) – Where do you want to store the pretrained models downloaded from huggingface.co.
revision (optional, defaults to None) – The specific model version to use (can be a branch name, tag name or commit id).
use_auth_token (optional, defaults to None) – Will use the token generated when running transformers-cli login (necessary to use this script with private models).
num_labels (optional, excluded if not provided) – Number of labels to use in the last layer added to the model, typically for a classification task.
finetuning_task (optional, excluded if not provided) – Name of the task used to fine-tune the model. This can be used when converting from an original PyTorch checkpoint.

Returns

dataclass with the above fields populated according to provided config.

class model_hub.huggingface.ModelKwargs(pretrained_model_name_or_path: str, cache_dir: Optional[str] = None, revision: Optional[str] = 'main', use_auth_token: Optional[bool] = False)#

Config parser for transformers model fields.

Parameters

pretrained_model_name_or_path – Path to pretrained model or model identifier from huggingface.co/models.
cache_dir (optional, defaults to None) – Where do you want to store the pretrained models downloaded from huggingface.co.
revision (optional, defaults to None) – The specific model version to use (can be a branch name, tag name or commit id).
use_auth_token (optional, defaults to None) – Will use the token generated when running transformers-cli login (necessary to use this script with private models).

Returns

dataclass with the above fields populated according to provided config.

class model_hub.huggingface.OptimizerKwargs(weight_decay: Optional[float] = 0, adafactor: Optional[bool] = False, learning_rate: Optional[float] = 5e-05, max_grad_norm: Optional[float] = 1.0, adam_beta1: Optional[float] = 0.9, adam_beta2: Optional[float] = 0.999, adam_epsilon: Optional[float] = 1e-08, scale_parameter: Optional[bool] = False, relative_step: Optional[bool] = False)#: Config parser for transformers optimizer fields.

class model_hub.huggingface.LRSchedulerKwargs(num_training_steps: int, lr_scheduler_type: Optional[str] = 'linear', num_warmup_steps: Optional[int] = 0)#: Config parser for transformers lr scheduler fields.

Transformers API

Contents

Transformers API#

model_hub.huggingface#

Helper Functions#

Structured Dataclasses#

`model_hub.huggingface`#