Buckets:
| # DistilBERT | |
| [DistilBERT](https://huggingface.co/papers/1910.01108) is pretrained by knowledge distillation to create a smaller model with faster inference and requires less compute to train. Through a triple loss objective during pretraining, language modeling loss, distillation loss, cosine-distance loss, DistilBERT demonstrates similar performance to a larger transformer language model. | |
| You can find all the original DistilBERT checkpoints under the [DistilBERT](https://huggingface.co/distilbert) organization. | |
| > [!TIP] | |
| > Click on the DistilBERT models in the right sidebar for more examples of how to apply DistilBERT to different language tasks. | |
| The example below demonstrates how to classify text with [Pipeline](/docs/transformers/pr_33892/en/main_classes/pipelines#transformers.Pipeline), [AutoModel](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoModel), and from the command line. | |
| <hfoptions id="usage"> | |
| <hfoption id="Pipeline"> | |
| ```py | |
| from transformers import pipeline | |
| classifier = pipeline( | |
| task="text-classification", | |
| model="distilbert-base-uncased-finetuned-sst-2-english", | |
| dtype=torch.float16, | |
| device=0 | |
| ) | |
| result = classifier("I love using Hugging Face Transformers!") | |
| print(result) | |
| # Output: [{'label': 'POSITIVE', 'score': 0.9998}] | |
| ``` | |
| </hfoption> | |
| <hfoption id="AutoModel"> | |
| ```py | |
| import torch | |
| from transformers import AutoModelForSequenceClassification, AutoTokenizer | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| "distilbert/distilbert-base-uncased-finetuned-sst-2-english", | |
| ) | |
| model = AutoModelForSequenceClassification.from_pretrained( | |
| "distilbert/distilbert-base-uncased-finetuned-sst-2-english", | |
| dtype=torch.float16, | |
| device_map="auto", | |
| attn_implementation="sdpa" | |
| ) | |
| inputs = tokenizer("I love using Hugging Face Transformers!", return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| predicted_class_id = torch.argmax(outputs.logits, dim=-1).item() | |
| predicted_label = model.config.id2label[predicted_class_id] | |
| print(f"Predicted label: {predicted_label}") | |
| ``` | |
| </hfoption> | |
| <hfoption id="transformers CLI"> | |
| ```bash | |
| echo -e "I love using Hugging Face Transformers!" | transformers run --task text-classification --model distilbert-base-uncased-finetuned-sst-2-english | |
| ``` | |
| </hfoption> | |
| </hfoptions> | |
| ## Notes | |
| - DistilBERT doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just | |
| separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`). | |
| - DistilBERT doesn't have options to select the input positions (`position_ids` input). This could be added if | |
| necessary though, just let us know if you need this option. | |
| ## DistilBertConfig[[transformers.DistilBertConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.DistilBertConfig</name><anchor>transformers.DistilBertConfig</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/configuration_distilbert.py#L28</source><parameters>[{"name": "vocab_size", "val": " = 30522"}, {"name": "max_position_embeddings", "val": " = 512"}, {"name": "sinusoidal_pos_embds", "val": " = False"}, {"name": "n_layers", "val": " = 6"}, {"name": "n_heads", "val": " = 12"}, {"name": "dim", "val": " = 768"}, {"name": "hidden_dim", "val": " = 3072"}, {"name": "dropout", "val": " = 0.1"}, {"name": "attention_dropout", "val": " = 0.1"}, {"name": "activation", "val": " = 'gelu'"}, {"name": "initializer_range", "val": " = 0.02"}, {"name": "qa_dropout", "val": " = 0.1"}, {"name": "seq_classif_dropout", "val": " = 0.2"}, {"name": "pad_token_id", "val": " = 0"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **vocab_size** (`int`, *optional*, defaults to 30522) -- | |
| Vocabulary size of the DistilBERT model. Defines the number of different tokens that can be represented by | |
| the `inputs_ids` passed when calling [DistilBertModel](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertModel) or `TFDistilBertModel`. | |
| - **max_position_embeddings** (`int`, *optional*, defaults to 512) -- | |
| The maximum sequence length that this model might ever be used with. Typically set this to something large | |
| just in case (e.g., 512 or 1024 or 2048). | |
| - **sinusoidal_pos_embds** (`boolean`, *optional*, defaults to `False`) -- | |
| Whether to use sinusoidal positional embeddings. | |
| - **n_layers** (`int`, *optional*, defaults to 6) -- | |
| Number of hidden layers in the Transformer encoder. | |
| - **n_heads** (`int`, *optional*, defaults to 12) -- | |
| Number of attention heads for each attention layer in the Transformer encoder. | |
| - **dim** (`int`, *optional*, defaults to 768) -- | |
| Dimensionality of the encoder layers and the pooler layer. | |
| - **hidden_dim** (`int`, *optional*, defaults to 3072) -- | |
| The size of the "intermediate" (often named feed-forward) layer in the Transformer encoder. | |
| - **dropout** (`float`, *optional*, defaults to 0.1) -- | |
| The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. | |
| - **attention_dropout** (`float`, *optional*, defaults to 0.1) -- | |
| The dropout ratio for the attention probabilities. | |
| - **activation** (`str` or `Callable`, *optional*, defaults to `"gelu"`) -- | |
| The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`, | |
| `"relu"`, `"silu"` and `"gelu_new"` are supported. | |
| - **initializer_range** (`float`, *optional*, defaults to 0.02) -- | |
| The standard deviation of the truncated_normal_initializer for initializing all weight matrices. | |
| - **qa_dropout** (`float`, *optional*, defaults to 0.1) -- | |
| The dropout probabilities used in the question answering model [DistilBertForQuestionAnswering](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertForQuestionAnswering). | |
| - **seq_classif_dropout** (`float`, *optional*, defaults to 0.2) -- | |
| The dropout probabilities used in the sequence classification and the multiple choice model | |
| [DistilBertForSequenceClassification](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertForSequenceClassification).</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| This is the configuration class to store the configuration of a [DistilBertModel](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertModel) or a `TFDistilBertModel`. It | |
| is used to instantiate a DistilBERT model according to the specified arguments, defining the model architecture. | |
| Instantiating a configuration with the defaults will yield a similar configuration to that of the DistilBERT | |
| [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) architecture. | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) for more information. | |
| <ExampleCodeBlock anchor="transformers.DistilBertConfig.example"> | |
| Examples: | |
| ```python | |
| >>> from transformers import DistilBertConfig, DistilBertModel | |
| >>> # Initializing a DistilBERT configuration | |
| >>> configuration = DistilBertConfig() | |
| >>> # Initializing a model (with random weights) from the configuration | |
| >>> model = DistilBertModel(configuration) | |
| >>> # Accessing the model configuration | |
| >>> configuration = model.config | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ## DistilBertTokenizer[[transformers.DistilBertTokenizer]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.DistilBertTokenizer</name><anchor>transformers.DistilBertTokenizer</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/tokenization_distilbert.py#L53</source><parameters>[{"name": "vocab_file", "val": ""}, {"name": "do_lower_case", "val": " = True"}, {"name": "do_basic_tokenize", "val": " = True"}, {"name": "never_split", "val": " = None"}, {"name": "unk_token", "val": " = '[UNK]'"}, {"name": "sep_token", "val": " = '[SEP]'"}, {"name": "pad_token", "val": " = '[PAD]'"}, {"name": "cls_token", "val": " = '[CLS]'"}, {"name": "mask_token", "val": " = '[MASK]'"}, {"name": "tokenize_chinese_chars", "val": " = True"}, {"name": "strip_accents", "val": " = None"}, {"name": "clean_up_tokenization_spaces", "val": " = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **vocab_file** (`str`) -- | |
| File containing the vocabulary. | |
| - **do_lower_case** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to lowercase the input when tokenizing. | |
| - **do_basic_tokenize** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to do basic tokenization before WordPiece. | |
| - **never_split** (`Iterable`, *optional*) -- | |
| Collection of tokens which will never be split during tokenization. Only has an effect when | |
| `do_basic_tokenize=True` | |
| - **unk_token** (`str`, *optional*, defaults to `"[UNK]"`) -- | |
| The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this | |
| token instead. | |
| - **sep_token** (`str`, *optional*, defaults to `"[SEP]"`) -- | |
| The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for | |
| sequence classification or for a text and a question for question answering. It is also used as the last | |
| token of a sequence built with special tokens. | |
| - **pad_token** (`str`, *optional*, defaults to `"[PAD]"`) -- | |
| The token used for padding, for example when batching sequences of different lengths. | |
| - **cls_token** (`str`, *optional*, defaults to `"[CLS]"`) -- | |
| The classifier token which is used when doing sequence classification (classification of the whole sequence | |
| instead of per-token classification). It is the first token of the sequence when built with special tokens. | |
| - **mask_token** (`str`, *optional*, defaults to `"[MASK]"`) -- | |
| The token used for masking values. This is the token used when training this model with masked language | |
| modeling. This is the token which the model will try to predict. | |
| - **tokenize_chinese_chars** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to tokenize Chinese characters. | |
| This should likely be deactivated for Japanese (see this | |
| [issue](https://github.com/huggingface/transformers/issues/328)). | |
| - **strip_accents** (`bool`, *optional*) -- | |
| Whether or not to strip all accents. If this option is not specified, then it will be determined by the | |
| value for `lowercase` (as in the original BERT). | |
| - **clean_up_tokenization_spaces** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to cleanup spaces after decoding, cleanup consists in removing potential artifacts like | |
| extra spaces.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Construct a DistilBERT tokenizer. Based on WordPiece. | |
| This tokenizer inherits from [PreTrainedTokenizer](/docs/transformers/pr_33892/en/main_classes/tokenizer#transformers.PreTrainedTokenizer) which contains most of the main methods. Users should refer to | |
| this superclass for more information regarding those methods. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>build_inputs_with_special_tokens</name><anchor>transformers.DistilBertTokenizer.build_inputs_with_special_tokens</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/tokenization_distilbert.py#L196</source><parameters>[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}]</parameters><paramsdesc>- **token_ids_0** (`List[int]`) -- | |
| List of IDs to which the special tokens will be added. | |
| - **token_ids_1** (`List[int]`, *optional*) -- | |
| Optional second list of IDs for sequence pairs.</paramsdesc><paramgroups>0</paramgroups><rettype>`List[int]`</rettype><retdesc>List of [input IDs](../glossary#input-ids) with the appropriate special tokens.</retdesc></docstring> | |
| Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and | |
| adding special tokens. A BERT sequence has the following format: | |
| - single sequence: `[CLS] X [SEP]` | |
| - pair of sequences: `[CLS] A [SEP] B [SEP]` | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>convert_tokens_to_string</name><anchor>transformers.DistilBertTokenizer.convert_tokens_to_string</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/tokenization_distilbert.py#L190</source><parameters>[{"name": "tokens", "val": ""}]</parameters></docstring> | |
| Converts a sequence of tokens (string) in a single string. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_special_tokens_mask</name><anchor>transformers.DistilBertTokenizer.get_special_tokens_mask</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/tokenization_distilbert.py#L222</source><parameters>[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}, {"name": "already_has_special_tokens", "val": ": bool = False"}]</parameters><paramsdesc>- **token_ids_0** (`List[int]`) -- | |
| List of IDs. | |
| - **token_ids_1** (`List[int]`, *optional*) -- | |
| Optional second list of IDs for sequence pairs. | |
| - **already_has_special_tokens** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not the token list is already formatted with special tokens for the model.</paramsdesc><paramgroups>0</paramgroups><rettype>`List[int]`</rettype><retdesc>A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.</retdesc></docstring> | |
| Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding | |
| special tokens using the tokenizer `prepare_for_model` method. | |
| </div></div> | |
| ## DistilBertTokenizerFast[[transformers.DistilBertTokenizerFast]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.DistilBertTokenizerFast</name><anchor>transformers.DistilBertTokenizerFast</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/tokenization_distilbert_fast.py#L32</source><parameters>[{"name": "vocab_file", "val": " = None"}, {"name": "tokenizer_file", "val": " = None"}, {"name": "do_lower_case", "val": " = True"}, {"name": "unk_token", "val": " = '[UNK]'"}, {"name": "sep_token", "val": " = '[SEP]'"}, {"name": "pad_token", "val": " = '[PAD]'"}, {"name": "cls_token", "val": " = '[CLS]'"}, {"name": "mask_token", "val": " = '[MASK]'"}, {"name": "tokenize_chinese_chars", "val": " = True"}, {"name": "strip_accents", "val": " = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **vocab_file** (`str`) -- | |
| File containing the vocabulary. | |
| - **do_lower_case** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to lowercase the input when tokenizing. | |
| - **unk_token** (`str`, *optional*, defaults to `"[UNK]"`) -- | |
| The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this | |
| token instead. | |
| - **sep_token** (`str`, *optional*, defaults to `"[SEP]"`) -- | |
| The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for | |
| sequence classification or for a text and a question for question answering. It is also used as the last | |
| token of a sequence built with special tokens. | |
| - **pad_token** (`str`, *optional*, defaults to `"[PAD]"`) -- | |
| The token used for padding, for example when batching sequences of different lengths. | |
| - **cls_token** (`str`, *optional*, defaults to `"[CLS]"`) -- | |
| The classifier token which is used when doing sequence classification (classification of the whole sequence | |
| instead of per-token classification). It is the first token of the sequence when built with special tokens. | |
| - **mask_token** (`str`, *optional*, defaults to `"[MASK]"`) -- | |
| The token used for masking values. This is the token used when training this model with masked language | |
| modeling. This is the token which the model will try to predict. | |
| - **clean_text** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to clean the text before tokenization by removing any control characters and replacing all | |
| whitespaces by the classic one. | |
| - **tokenize_chinese_chars** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to tokenize Chinese characters. This should likely be deactivated for Japanese (see [this | |
| issue](https://github.com/huggingface/transformers/issues/328)). | |
| - **strip_accents** (`bool`, *optional*) -- | |
| Whether or not to strip all accents. If this option is not specified, then it will be determined by the | |
| value for `lowercase` (as in the original BERT). | |
| - **wordpieces_prefix** (`str`, *optional*, defaults to `"##"`) -- | |
| The prefix for subwords.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Construct a "fast" DistilBERT tokenizer (backed by HuggingFace's *tokenizers* library). Based on WordPiece. | |
| This tokenizer inherits from [PreTrainedTokenizerFast](/docs/transformers/pr_33892/en/main_classes/tokenizer#transformers.PreTrainedTokenizerFast) which contains most of the main methods. Users should | |
| refer to this superclass for more information regarding those methods. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>build_inputs_with_special_tokens</name><anchor>transformers.DistilBertTokenizerFast.build_inputs_with_special_tokens</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/tokenization_distilbert_fast.py#L119</source><parameters>[{"name": "token_ids_0", "val": ""}, {"name": "token_ids_1", "val": " = None"}]</parameters><paramsdesc>- **token_ids_0** (`List[int]`) -- | |
| List of IDs to which the special tokens will be added. | |
| - **token_ids_1** (`List[int]`, *optional*) -- | |
| Optional second list of IDs for sequence pairs.</paramsdesc><paramgroups>0</paramgroups><rettype>`List[int]`</rettype><retdesc>List of [input IDs](../glossary#input-ids) with the appropriate special tokens.</retdesc></docstring> | |
| Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and | |
| adding special tokens. A BERT sequence has the following format: | |
| - single sequence: `[CLS] X [SEP]` | |
| - pair of sequences: `[CLS] A [SEP] B [SEP]` | |
| </div></div> | |
| ## DistilBertModel[[transformers.DistilBertModel]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.DistilBertModel</name><anchor>transformers.DistilBertModel</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L322</source><parameters>[{"name": "config", "val": ": PreTrainedConfig"}]</parameters><paramsdesc>- **config** ([PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The bare Distilbert Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.DistilBertModel.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L386</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "position_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, num_choices)`) -- | |
| Indices of input sequence tokens in the vocabulary. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, num_choices, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids)</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.BaseModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.BaseModelOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.BaseModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.BaseModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DistilBertConfig](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertConfig)) and inputs. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [DistilBertModel](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertModel) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| </div></div> | |
| ## DistilBertForMaskedLM[[transformers.DistilBertForMaskedLM]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.DistilBertForMaskedLM</name><anchor>transformers.DistilBertForMaskedLM</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L432</source><parameters>[{"name": "config", "val": ": PreTrainedConfig"}]</parameters><paramsdesc>- **config** ([PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| DistilBert Model with a `masked language modeling` head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.DistilBertForMaskedLM.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L476</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "labels", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "position_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, num_choices)`) -- | |
| Indices of input sequence tokens in the vocabulary. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, num_choices, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., | |
| config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the | |
| loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`. | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids)</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.MaskedLMOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.MaskedLMOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DistilBertConfig](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertConfig)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Masked language modeling (MLM) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [DistilBertForMaskedLM](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertForMaskedLM) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.DistilBertForMaskedLM.forward.example"> | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, DistilBertForMaskedLM | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") | |
| >>> model = DistilBertForMaskedLM.from_pretrained("distilbert-base-uncased") | |
| >>> inputs = tokenizer("The capital of France is <mask>.", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> # retrieve index of <mask> | |
| >>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0] | |
| >>> predicted_token_id = logits[0, mask_token_index].argmax(axis=-1) | |
| >>> tokenizer.decode(predicted_token_id) | |
| ... | |
| >>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"] | |
| >>> # mask labels of non-<mask> tokens | |
| >>> labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100) | |
| >>> outputs = model(**inputs, labels=labels) | |
| >>> round(outputs.loss.item(), 2) | |
| ... | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## DistilBertForSequenceClassification[[transformers.DistilBertForSequenceClassification]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.DistilBertForSequenceClassification</name><anchor>transformers.DistilBertForSequenceClassification</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L536</source><parameters>[{"name": "config", "val": ": PreTrainedConfig"}]</parameters><paramsdesc>- **config** ([PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| DistilBert Model transformer with a sequence classification/regression head on top (a linear layer on top of the | |
| pooled output) e.g. for GLUE tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.DistilBertForSequenceClassification.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L570</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "labels", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "position_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]</parameters><paramsdesc>- **input_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., | |
| config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If | |
| `config.num_labels > 1` a classification loss is computed (Cross-Entropy). | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids)</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.SequenceClassifierOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.SequenceClassifierOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DistilBertConfig](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertConfig)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification (or regression if config.num_labels==1) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, config.num_labels)`) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [DistilBertForSequenceClassification](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertForSequenceClassification) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.DistilBertForSequenceClassification.forward.example"> | |
| Example of single-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, DistilBertForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") | |
| >>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased") | |
| >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_id = logits.argmax().item() | |
| >>> model.config.id2label[predicted_class_id] | |
| ... | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=num_labels) | |
| >>> labels = torch.tensor([1]) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| </ExampleCodeBlock> | |
| <ExampleCodeBlock anchor="transformers.DistilBertForSequenceClassification.forward.example-2"> | |
| Example of multi-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, DistilBertForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") | |
| >>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", problem_type="multi_label_classification") | |
| >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5] | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = DistilBertForSequenceClassification.from_pretrained( | |
| ... "distilbert-base-uncased", num_labels=num_labels, problem_type="multi_label_classification" | |
| ... ) | |
| >>> labels = torch.sum( | |
| ... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1 | |
| ... ).to(torch.float) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## DistilBertForMultipleChoice[[transformers.DistilBertForMultipleChoice]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.DistilBertForMultipleChoice</name><anchor>transformers.DistilBertForMultipleChoice</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L811</source><parameters>[{"name": "config", "val": ": PreTrainedConfig"}]</parameters><paramsdesc>- **config** ([PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The Distilbert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a | |
| softmax) e.g. for RocStories/SWAG tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.DistilBertForMultipleChoice.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L843</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "labels", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "position_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, num_choices, sequence_length)`) -- | |
| Indices of input sequence tokens in the vocabulary. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, num_choices, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the multiple choice classification loss. Indices should be in `[0, ..., | |
| num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See | |
| `input_ids` above) | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids)</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.MultipleChoiceModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.MultipleChoiceModelOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.MultipleChoiceModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.MultipleChoiceModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DistilBertConfig](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertConfig)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape *(1,)*, *optional*, returned when `labels` is provided) -- Classification loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, num_choices)`) -- *num_choices* is the second dimension of the input tensors. (see *input_ids* above). | |
| Classification scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [DistilBertForMultipleChoice](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertForMultipleChoice) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.DistilBertForMultipleChoice.forward.example"> | |
| Examples: | |
| ```python | |
| >>> from transformers import AutoTokenizer, DistilBertForMultipleChoice | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased") | |
| >>> model = DistilBertForMultipleChoice.from_pretrained("distilbert-base-cased") | |
| >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." | |
| >>> choice0 = "It is eaten with a fork and a knife." | |
| >>> choice1 = "It is eaten while held in the hand." | |
| >>> labels = torch.tensor(0).unsqueeze(0) # choice0 is correct (according to Wikipedia ;)), batch size 1 | |
| >>> encoding = tokenizer([[prompt, choice0], [prompt, choice1]], return_tensors="pt", padding=True) | |
| >>> outputs = model(**{k: v.unsqueeze(0) for k, v in encoding.items()}, labels=labels) # batch size is 1 | |
| >>> # the linear classifier still needs to be trained | |
| >>> loss = outputs.loss | |
| >>> logits = outputs.logits | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## DistilBertForTokenClassification[[transformers.DistilBertForTokenClassification]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.DistilBertForTokenClassification</name><anchor>transformers.DistilBertForTokenClassification</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L736</source><parameters>[{"name": "config", "val": ": PreTrainedConfig"}]</parameters><paramsdesc>- **config** ([PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The Distilbert transformer with a token classification head on top (a linear layer on top of the hidden-states | |
| output) e.g. for Named-Entity-Recognition (NER) tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.DistilBertForTokenClassification.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L768</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "labels", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "position_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]</parameters><paramsdesc>- **input_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`. | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids)</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.TokenClassifierOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.TokenClassifierOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DistilBertConfig](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertConfig)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.num_labels)`) -- Classification scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [DistilBertForTokenClassification](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertForTokenClassification) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.DistilBertForTokenClassification.forward.example"> | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, DistilBertForTokenClassification | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") | |
| >>> model = DistilBertForTokenClassification.from_pretrained("distilbert-base-uncased") | |
| >>> inputs = tokenizer( | |
| ... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt" | |
| ... ) | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_token_class_ids = logits.argmax(-1) | |
| >>> # Note that tokens are classified rather then input words which means that | |
| >>> # there might be more predicted token classes than words. | |
| >>> # Multiple token classes might account for the same word | |
| >>> predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]] | |
| >>> predicted_tokens_classes | |
| ... | |
| >>> labels = predicted_token_class_ids | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## DistilBertForQuestionAnswering[[transformers.DistilBertForQuestionAnswering]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.DistilBertForQuestionAnswering</name><anchor>transformers.DistilBertForQuestionAnswering</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L634</source><parameters>[{"name": "config", "val": ": PreTrainedConfig"}]</parameters><paramsdesc>- **config** ([PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The Distilbert transformer with a span classification head on top for extractive question-answering tasks like | |
| SQuAD (a linear layer on top of the hidden-states output to compute `span start logits` and `span end logits`). | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.DistilBertForQuestionAnswering.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/distilbert/modeling_distilbert.py#L668</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "start_positions", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "end_positions", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "position_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, num_choices)`) -- | |
| Indices of input sequence tokens in the vocabulary. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, num_choices, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **start_positions** (`torch.Tensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for position (index) of the start of the labelled span for computing the token classification loss. | |
| Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence | |
| are not taken into account for computing the loss. | |
| - **end_positions** (`torch.Tensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for position (index) of the end of the labelled span for computing the token classification loss. | |
| Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence | |
| are not taken into account for computing the loss. | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids)</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.QuestionAnsweringModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.QuestionAnsweringModelOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.QuestionAnsweringModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.QuestionAnsweringModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DistilBertConfig](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertConfig)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Total span extraction loss is the sum of a Cross-Entropy for the start and end positions. | |
| - **start_logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`) -- Span-start scores (before SoftMax). | |
| - **end_logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`) -- Span-end scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [DistilBertForQuestionAnswering](/docs/transformers/pr_33892/en/model_doc/distilbert#transformers.DistilBertForQuestionAnswering) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.DistilBertForQuestionAnswering.forward.example"> | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, DistilBertForQuestionAnswering | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") | |
| >>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased") | |
| >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" | |
| >>> inputs = tokenizer(question, text, return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... outputs = model(**inputs) | |
| >>> answer_start_index = outputs.start_logits.argmax() | |
| >>> answer_end_index = outputs.end_logits.argmax() | |
| >>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] | |
| >>> tokenizer.decode(predict_answer_tokens, skip_special_tokens=True) | |
| ... | |
| >>> # target is "nice puppet" | |
| >>> target_start_index = torch.tensor([14]) | |
| >>> target_end_index = torch.tensor([15]) | |
| >>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index) | |
| >>> loss = outputs.loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/en/model_doc/distilbert.md" /> |
Xet Storage Details
- Size:
- 66.9 kB
- Xet hash:
- 6401113c27e4309050bd1d7bd863cd213618690d294e02f5df7a7fe1a929418e
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.