Transformers API#

model_hub.huggingface#

class model_hub.huggingface.BaseTransformerTrial(context: determined.pytorch._pytorch_context.PyTorchTrialContext)#

This is the base PyTorchTrial for transformers that implements the __init__ and train_batch methods.

You can subclass BaseTransformerTrial to customize a trial for your own usage by filing in the expected methods for data loading and evaluation.

The __init__ method replicated below makes heavy use of the helper functions in the next section.

    def __init__(self, context: det_torch.PyTorchTrialContext) -> None:
        self.context = context
        # A subclass of BaseTransformerTrial may have already set hparams and data_config
        # attributes so we only reset them if they do not exist.
        if not hasattr(self, "hparams"):
            self.hparams = attrdict.AttrDict(context.get_hparams())
        if not hasattr(self, "data_config"):
            self.data_config = attrdict.AttrDict(context.get_data_config())
        if not hasattr(self, "exp_config"):
            self.exp_config = attrdict.AttrDict(context.get_experiment_config())
        # Check to make sure all expected hyperparameters are set.
        self.check_hparams()

        # Parse hparams and data_config.
        (
            self.config_kwargs,
            self.tokenizer_kwargs,
            self.model_kwargs,
        ) = hf_parse.default_parse_config_tokenizer_model_kwargs(self.hparams)
        optimizer_kwargs, scheduler_kwargs = hf_parse.default_parse_optimizer_lr_scheduler_kwargs(
            self.hparams
        )

        self.config, self.tokenizer, self.model = build_using_auto(
            self.config_kwargs,
            self.tokenizer_kwargs,
            self.hparams.model_mode,
            self.model_kwargs,
            use_pretrained_weights=self.hparams.use_pretrained_weights,
        )
        self.model = self.context.wrap_model(self.model)

        self.optimizer = self.context.wrap_optimizer(
            build_default_optimizer(self.model, optimizer_kwargs)
        )

        if self.hparams.use_apex_amp:
            self.model, self.optimizer = self.context.configure_apex_amp(
                models=self.model,
                optimizers=self.optimizer,
            )

        self.lr_scheduler = self.context.wrap_lr_scheduler(
            build_default_lr_scheduler(self.optimizer, scheduler_kwargs),
            det_torch.LRScheduler.StepMode.STEP_EVERY_BATCH,
        )

        self.grad_clip_fn = None

        if optimizer_kwargs.max_grad_norm > 0:  # type: ignore
            self.grad_clip_fn = lambda x: torch.nn.utils.clip_grad_norm_(
                x, optimizer_kwargs.max_grad_norm
            )

The evaluate_batch method replicated below should work for most models and tasks but can be overwritten for more custom behavior in a subclass.

    def train_batch(self, batch: Any, epoch_idx: int, batch_idx: int) -> Any:
        # By default, all HF models return the loss in the first element.
        # We do not automatically apply a label smoother for the user.
        # If this is something you want to use, please see how it's
        # applied by transformers.Trainer:
        # https://github.com/huggingface/transformers/blob/v4.3.3/src/transformers/trainer.py#L1324
        outputs = self.model(**batch)
        loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]
        self.context.backward(loss)
        self.context.step_optimizer(self.optimizer, self.grad_clip_fn)
        return loss

Helper Functions#

The BaseTransformerTrial calls many helper functions below that are also useful when subclassing BaseTransformerTrial or writing custom transformers trials for use with Determined.

model_hub.huggingface.build_default_lr_scheduler(optimizer: torch.optim.optimizer.Optimizer, scheduler_kwargs: model_hub.huggingface._config_parser.LRSchedulerKwargs) Any#

This follows the function in transformer’s Trainer to construct the lr_scheduler.

Parameters
  • optimizer – optimizer to apply lr_scheduler to

  • scheduler_kwargs – see LRSchedulerKwargs in _config_parser.py for expected fields.

Returns

lr_scheduler configured accordingly

model_hub.huggingface.build_default_optimizer(model: torch.nn.modules.module.Module, optimizer_kwargs: model_hub.huggingface._config_parser.OptimizerKwargs) Union[transformers.optimization.Adafactor, transformers.optimization.AdamW]#

This follows the function in transformer’s Trainer to construct the optimizer.

Parameters
  • model – model whose parameters will be updated by the optimizer

  • weight_decay – weight_decay factor to apply to weights

  • optimizer_kwargs – see OptimizerKwargs in _config_parser.py for expected fields

Returns

optimizer configured accordingly

model_hub.huggingface.build_using_auto(config_kwargs: Union[Dict, attrdict.dictionary.AttrDict], tokenizer_kwargs: Union[Dict, attrdict.dictionary.AttrDict], model_mode: str, model_kwargs: Union[Dict, attrdict.dictionary.AttrDict], use_pretrained_weights: bool = True) Tuple[transformers.PretrainedConfig, transformers.PreTrainedTokenizer, transformers.PreTrainedModel]#

Build the config, tokenizer, and model using tranformer’s Auto classes.

Parameters
  • config_kwargs – arguments for transformers configuration classes

  • tokenizer_kwargs – arguments for transformers tokenizer classes

  • model_mode – one of (pretraining, causal-lm, masked-lm, seq2seq-lm, sequence-classification, multiple-choice, next-sentence, token-classification, question-answering)

  • model_kwargs – arguments for transformers model classes

Returns

transformer config, tokenizer, and model

model_hub.huggingface.default_load_dataset(data_config: Union[Dict, attrdict.dictionary.AttrDict]) Union[datasets.Dataset, datasets.IterableDataset, datasets.DatasetDict, datasets.IterableDatasetDict]#

Creates the dataset using HuggingFace datasets’ load_dataset method. If a dataset_name is provided, we will use that long with the dataset_config_name. Otherwise, we will create the dataset using provided train_file and validation_file.

Parameters

data_config – arguments for load_dataset. See DatasetKwargs for expected fields.

Returns

Dataset returned from hf_datasets.load_dataset.

model_hub.huggingface.default_parse_config_tokenizer_model_kwargs(hparams: Union[Dict, attrdict.dictionary.AttrDict]) Tuple[Dict, Dict, Dict]#

This function will provided hparams into fields for the transformers config, tokenizer, and model. See the defined dataclasses ConfigKwargs, TokenizerKwargs, and ModelKwargs for expected fields and defaults.

Parameters

hparams – hyperparameters to parse.

Returns

One dictionary each for the config, tokenizer, and model.

model_hub.huggingface.default_parse_optimizer_lr_scheduler_kwargs(hparams: Union[Dict, attrdict.dictionary.AttrDict]) Tuple[model_hub.huggingface._config_parser.OptimizerKwargs, model_hub.huggingface._config_parser.LRSchedulerKwargs]#

Parse hparams relevant for the optimizer and lr_scheduler and fills in with the same defaults as those used by the transformers Trainer. See the defined dataclasses OptimizerKwargs and LRSchedulerKwargs for expected fields and defaults.

Parameters

hparams – hparams to parse.

Returns

Configuration for the optimizer and lr scheduler.

Structured Dataclasses#

Structured dataclasses are used to ensure that Determined parses the experiment config correctly. See the below classes for details on what fields can be used in the experiment config to configure the dataset; transformers config, model, and tokenizer; as well as optimizer and learning rate scheduler for use with the functions above.

class model_hub.huggingface.DatasetKwargs(**kwargs: Dict[str, Any])#

Config parser for dataset fields.

Either dataset_name needs to be provided or train_file and validation_file need to be provided.

Parameters
  • dataset_name (optional, defaults to None) – Path argument to pass to HuggingFace datasets.load_dataset. Can be a dataset identifier in HuggingFace Datasets Hub or a local path to processing script.

  • dataset_config_name (optional, defaults to None) – The name of the dataset configuration to pass to HuggingFace datasets.load_dataset.

  • validation_split_percentage (optional, defaults to None) – This is used to create a validation split from the training data when a dataset does not have a predefined validation split.

  • train_file (optional, defaults to None) – Path to training data. This will be used if a dataset_name is not provided.

  • validation_file (optional, defaults to None) – Path to validation data. This will be used if a dataset_name is not provided.

Returns

dataclass with the above fields populated according to provided config.

class model_hub.huggingface.ConfigKwargs(**kwargs: Dict[str, Any])#

Config parser for transformers config fields.

Parameters
  • pretrained_model_name_or_path – Path to pretrained model or model identifier from huggingface.co/models.

  • cache_dir (optional, defaults to None) – Where do you want to store the pretrained models downloaded from huggingface.co.

  • revision (optional, defaults to None) – The specific model version to use (can be a branch name, tag name or commit id).

  • use_auth_token (optional, defaults to None) – Will use the token generated when running transformers-cli login (necessary to use this script with private models).

  • num_labels (optional, excluded if not provided) – Number of labels to use in the last layer added to the model, typically for a classification task.

  • finetuning_task (optional, excluded if not provided) – Name of the task used to fine-tune the model. This can be used when converting from an original PyTorch checkpoint.

Returns

dataclass with the above fields populated according to provided config.

class model_hub.huggingface.ModelKwargs(pretrained_model_name_or_path: str, cache_dir: Optional[str] = None, revision: Optional[str] = 'main', use_auth_token: Optional[bool] = False)#

Config parser for transformers model fields.

Parameters
  • pretrained_model_name_or_path – Path to pretrained model or model identifier from huggingface.co/models.

  • cache_dir (optional, defaults to None) – Where do you want to store the pretrained models downloaded from huggingface.co.

  • revision (optional, defaults to None) – The specific model version to use (can be a branch name, tag name or commit id).

  • use_auth_token (optional, defaults to None) – Will use the token generated when running transformers-cli login (necessary to use this script with private models).

Returns

dataclass with the above fields populated according to provided config.

class model_hub.huggingface.OptimizerKwargs(weight_decay: Optional[float] = 0, adafactor: Optional[bool] = False, learning_rate: Optional[float] = 5e-05, max_grad_norm: Optional[float] = 1.0, adam_beta1: Optional[float] = 0.9, adam_beta2: Optional[float] = 0.999, adam_epsilon: Optional[float] = 1e-08, scale_parameter: Optional[bool] = False, relative_step: Optional[bool] = False)#

Config parser for transformers optimizer fields.

class model_hub.huggingface.LRSchedulerKwargs(num_training_steps: int, lr_scheduler_type: Optional[str] = 'linear', num_warmup_steps: Optional[int] = 0)#

Config parser for transformers lr scheduler fields.