Buckets:
| # T5 | |
| [T5](https://huggingface.co/papers/1910.10683) is a encoder-decoder transformer available in a range of sizes from 60M to 11B parameters. It is designed to handle a wide range of NLP tasks by treating them all as text-to-text problems. This eliminates the need for task-specific architectures because T5 converts every NLP task into a text generation task. | |
| To formulate every task as text generation, each task is prepended with a task-specific prefix (e.g., translate English to German: ..., summarize: ...). This enables T5 to handle tasks like translation, summarization, question answering, and more. | |
| You can find all official T5 checkpoints under the [T5](https://huggingface.co/collections/google/t5-release-65005e7c520f8d7b4d037918) collection. | |
| > [!TIP] | |
| > Click on the T5 models in the right sidebar for more examples of how to apply T5 to different language tasks. | |
| The example below demonstrates how to generate text with [Pipeline](/docs/transformers/pr_33892/en/main_classes/pipelines#transformers.Pipeline), [AutoModel](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoModel), and how to translate with T5 from the command line. | |
| <hfoptions id="usage"> | |
| <hfoption id="Pipeline"> | |
| ```py | |
| import torch | |
| from transformers import pipeline | |
| pipeline = pipeline( | |
| task="text2text-generation", | |
| model="google-t5/t5-base", | |
| dtype=torch.float16, | |
| device=0 | |
| ) | |
| pipeline("translate English to French: The weather is nice today.") | |
| ``` | |
| </hfoption> | |
| <hfoption id="AutoModel"> | |
| ```py | |
| import torch | |
| from transformers import AutoModelForSeq2SeqLM, AutoTokenizer | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| "google-t5/t5-base" | |
| ) | |
| model = AutoModelForSeq2SeqLM.from_pretrained( | |
| "google-t5/t5-base", | |
| dtype=torch.float16, | |
| device_map="auto" | |
| ) | |
| input_ids = tokenizer("translate English to French: The weather is nice today.", return_tensors="pt").to(model.device) | |
| output = model.generate(**input_ids, cache_implementation="static") | |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| </hfoption> | |
| <hfoption id="transformers CLI"> | |
| ```bash | |
| echo -e "translate English to French: The weather is nice today." | transformers run --task text2text-generation --model google-t5/t5-base --device 0 | |
| ``` | |
| </hfoption> | |
| </hfoptions> | |
| Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends. | |
| The example below uses [torchao](../quantization/torchao) to only quantize the weights to int4. | |
| ```py | |
| # pip install torchao | |
| import torch | |
| from transformers import TorchAoConfig, AutoModelForSeq2SeqLM, AutoTokenizer | |
| quantization_config = TorchAoConfig("int4_weight_only", group_size=128) | |
| model = AutoModelForSeq2SeqLM.from_pretrained( | |
| "google/t5-v1_1-xl", | |
| dtype=torch.bfloat16, | |
| device_map="auto", | |
| quantization_config=quantization_config | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained("google/t5-v1_1-xl") | |
| input_ids = tokenizer("translate English to French: The weather is nice today.", return_tensors="pt").to(model.device) | |
| output = model.generate(**input_ids, cache_implementation="static") | |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) | |
| ``` | |
| ## Notes | |
| - You can pad the encoder inputs on the left or right because T5 uses relative scalar embeddings. | |
| - T5 models need a slightly higher learning rate than the default used in [Trainer](/docs/transformers/pr_33892/en/main_classes/trainer#transformers.Trainer). Typically, values of `1e-4` and `3e-4` work well for most tasks. | |
| ## T5Config[[transformers.T5Config]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.T5Config</name><anchor>transformers.T5Config</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/configuration_t5.py#L27</source><parameters>[{"name": "vocab_size", "val": " = 32128"}, {"name": "d_model", "val": " = 512"}, {"name": "d_kv", "val": " = 64"}, {"name": "d_ff", "val": " = 2048"}, {"name": "num_layers", "val": " = 6"}, {"name": "num_decoder_layers", "val": " = None"}, {"name": "num_heads", "val": " = 8"}, {"name": "relative_attention_num_buckets", "val": " = 32"}, {"name": "relative_attention_max_distance", "val": " = 128"}, {"name": "dropout_rate", "val": " = 0.1"}, {"name": "layer_norm_epsilon", "val": " = 1e-06"}, {"name": "initializer_factor", "val": " = 1.0"}, {"name": "feed_forward_proj", "val": " = 'relu'"}, {"name": "is_encoder_decoder", "val": " = True"}, {"name": "use_cache", "val": " = True"}, {"name": "pad_token_id", "val": " = 0"}, {"name": "eos_token_id", "val": " = 1"}, {"name": "classifier_dropout", "val": " = 0.0"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **vocab_size** (`int`, *optional*, defaults to 32128) -- | |
| Vocabulary size of the T5 model. Defines the number of different tokens that can be represented by the | |
| `inputs_ids` passed when calling [T5Model](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Model) or `TFT5Model`. | |
| - **d_model** (`int`, *optional*, defaults to 512) -- | |
| Size of the encoder layers and the pooler layer. | |
| - **d_kv** (`int`, *optional*, defaults to 64) -- | |
| Size of the key, query, value projections per attention head. The `inner_dim` of the projection layer will | |
| be defined as `num_heads * d_kv`. | |
| - **d_ff** (`int`, *optional*, defaults to 2048) -- | |
| Size of the intermediate feed forward layer in each `T5Block`. | |
| - **num_layers** (`int`, *optional*, defaults to 6) -- | |
| Number of hidden layers in the Transformer encoder. | |
| - **num_decoder_layers** (`int`, *optional*) -- | |
| Number of hidden layers in the Transformer decoder. Will use the same value as `num_layers` if not set. | |
| - **num_heads** (`int`, *optional*, defaults to 8) -- | |
| Number of attention heads for each attention layer in the Transformer encoder. | |
| - **relative_attention_num_buckets** (`int`, *optional*, defaults to 32) -- | |
| The number of buckets to use for each attention layer. | |
| - **relative_attention_max_distance** (`int`, *optional*, defaults to 128) -- | |
| The maximum distance of the longer sequences for the bucket separation. | |
| - **dropout_rate** (`float`, *optional*, defaults to 0.1) -- | |
| The ratio for all dropout layers. | |
| - **classifier_dropout** (`float`, *optional*, defaults to 0.0) -- | |
| The dropout ratio for classifier. | |
| - **layer_norm_eps** (`float`, *optional*, defaults to 1e-6) -- | |
| The epsilon used by the layer normalization layers. | |
| - **initializer_factor** (`float`, *optional*, defaults to 1) -- | |
| A factor for initializing all weight matrices (should be kept to 1, used internally for initialization | |
| testing). | |
| - **feed_forward_proj** (`string`, *optional*, defaults to `"relu"`) -- | |
| Type of feed forward layer to be used. Should be one of `"relu"` or `"gated-gelu"`. T5v1.1 uses the | |
| `"gated-gelu"` feed forward projection. Original T5 uses `"relu"`. | |
| - **use_cache** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not the model should return the last key/values attentions (not used by all models).</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| This is the configuration class to store the configuration of a [T5Model](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Model) or a `TFT5Model`. It is used to | |
| instantiate a T5 model according to the specified arguments, defining the model architecture. Instantiating a | |
| configuration with the defaults will yield a similar configuration to that of the T5 | |
| [google-t5/t5-small](https://huggingface.co/google-t5/t5-small) architecture. | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) for more information. | |
| </div> | |
| ## T5Tokenizer[[transformers.T5Tokenizer]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.T5Tokenizer</name><anchor>transformers.T5Tokenizer</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/tokenization_t5.py#L45</source><parameters>[{"name": "vocab_file", "val": ""}, {"name": "eos_token", "val": " = '</s>'"}, {"name": "unk_token", "val": " = '<unk>'"}, {"name": "pad_token", "val": " = '<pad>'"}, {"name": "extra_ids", "val": " = 100"}, {"name": "additional_special_tokens", "val": " = None"}, {"name": "sp_model_kwargs", "val": ": typing.Optional[dict[str, typing.Any]] = None"}, {"name": "legacy", "val": " = None"}, {"name": "add_prefix_space", "val": " = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **vocab_file** (`str`) -- | |
| [SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that | |
| contains the vocabulary necessary to instantiate a tokenizer. | |
| - **eos_token** (`str`, *optional*, defaults to `"</s>"`) -- | |
| The end of sequence token. | |
| <Tip> | |
| When building a sequence using special tokens, this is not the token that is used for the end of sequence. | |
| The token used is the `sep_token`. | |
| </Tip> | |
| - **unk_token** (`str`, *optional*, defaults to `"<unk>"`) -- | |
| The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this | |
| token instead. | |
| - **pad_token** (`str`, *optional*, defaults to `"<pad>"`) -- | |
| The token used for padding, for example when batching sequences of different lengths. | |
| - **extra_ids** (`int`, *optional*, defaults to 100) -- | |
| Add a number of extra ids added to the vocabulary for use as sentinels. These tokens are | |
| accessible as "<extra_id_{%d}>" where "{%d}" is a number between 0 and extra_ids-1. These tokens can be | |
| retrieved by calling get_sentinel_tokens method and token ids can be by calling get_sentinel_token_ids | |
| method | |
| additional_special_tokens (`list[str]`, *optional*): | |
| Additional special tokens used by the tokenizer. | |
| - **sp_model_kwargs** (`dict`, *optional*) -- | |
| Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for | |
| SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, | |
| to set: | |
| - `enable_sampling`: Enable subword regularization. | |
| - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout. | |
| - `nbest_size = {0,1}`: No sampling is performed. | |
| - `nbest_size > 1`: samples from the nbest_size results. | |
| - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice) | |
| using forward-filtering-and-backward-sampling algorithm. | |
| - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for | |
| BPE-dropout. | |
| - **legacy** (`bool`, *optional*) -- | |
| Whether or not the `legacy` behaviour of the tokenizer should be used. Legacy is before the merge of #24622 | |
| and #25224 which includes fixes to properly handle tokens that appear after special tokens. A simple | |
| example: | |
| - `legacy=True`: | |
| ```python | |
| >>> from transformers import T5Tokenizer | |
| >>> tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base", legacy=True) | |
| >>> tokenizer.encode("Hello <extra_id_0>.") | |
| [8774, 32099, 3, 5, 1] | |
| ``` | |
| - `legacy=False`: | |
| ```python | |
| >>> from transformers import T5Tokenizer | |
| >>> tokenizer = T5Tokenizer.from_pretrained("google-t5/t5-base", legacy=False) | |
| >>> tokenizer.encode("Hello <extra_id_0>.") # the extra space `[3]` is no longer here | |
| [8774, 32099, 5, 1] | |
| ``` | |
| Checkout the [pull request](https://github.com/huggingface/transformers/pull/24565) for more details. | |
| - **add_prefix_space** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to add an initial space to the input. This allows to treat the leading word just as any | |
| other word. | |
| - **sp_model** (`SentencePieceProcessor`) -- | |
| The *SentencePiece* processor that is used for every conversion (string, tokens and IDs).</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Construct a T5 tokenizer. Based on [SentencePiece](https://github.com/google/sentencepiece). | |
| This tokenizer inherits from [PreTrainedTokenizer](/docs/transformers/pr_33892/en/main_classes/tokenizer#transformers.PreTrainedTokenizer) which contains most of the main methods. Users should refer to | |
| this superclass for more information regarding those methods. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>build_inputs_with_special_tokens</name><anchor>transformers.T5Tokenizer.build_inputs_with_special_tokens</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/tokenization_t5.py#L317</source><parameters>[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}]</parameters><paramsdesc>- **token_ids_0** (`list[int]`) -- | |
| List of IDs to which the special tokens will be added. | |
| - **token_ids_1** (`list[int]`, *optional*) -- | |
| Optional second list of IDs for sequence pairs.</paramsdesc><paramgroups>0</paramgroups><rettype>`list[int]`</rettype><retdesc>List of [input IDs](../glossary#input-ids) with the appropriate special tokens.</retdesc></docstring> | |
| Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and | |
| adding special tokens. A sequence has the following format: | |
| - single sequence: `X </s>` | |
| - pair of sequences: `A </s> B </s>` | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_special_tokens_mask</name><anchor>transformers.T5Tokenizer.get_special_tokens_mask</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/tokenization_t5.py#L248</source><parameters>[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}, {"name": "already_has_special_tokens", "val": ": bool = False"}]</parameters><paramsdesc>- **token_ids_0** (`list[int]`) -- | |
| List of IDs. | |
| - **token_ids_1** (`list[int]`, *optional*) -- | |
| Optional second list of IDs for sequence pairs. | |
| - **already_has_special_tokens** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not the token list is already formatted with special tokens for the model.</paramsdesc><paramgroups>0</paramgroups><rettype>`list[int]`</rettype><retdesc>A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.</retdesc></docstring> | |
| Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding | |
| special tokens using the tokenizer `prepare_for_model` method. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>create_token_type_ids_from_sequences</name><anchor>transformers.T5Tokenizer.create_token_type_ids_from_sequences</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/tokenization_t5.py#L295</source><parameters>[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}]</parameters><paramsdesc>- **token_ids_0** (`list[int]`) -- | |
| List of IDs. | |
| - **token_ids_1** (`list[int]`, *optional*) -- | |
| Optional second list of IDs for sequence pairs.</paramsdesc><paramgroups>0</paramgroups><rettype>`list[int]`</rettype><retdesc>List of zeros.</retdesc></docstring> | |
| Create a mask from the two sequences passed to be used in a sequence-pair classification task. T5 does not make | |
| use of token type ids, therefore a list of zeros is returned. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>save_vocabulary</name><anchor>transformers.T5Tokenizer.save_vocabulary</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/tokenization_t5.py#L430</source><parameters>[{"name": "save_directory", "val": ": str"}, {"name": "filename_prefix", "val": ": typing.Optional[str] = None"}]</parameters></docstring> | |
| </div></div> | |
| ## T5TokenizerFast[[transformers.T5TokenizerFast]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.T5TokenizerFast</name><anchor>transformers.T5TokenizerFast</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/tokenization_t5_fast.py#L41</source><parameters>[{"name": "vocab_file", "val": " = None"}, {"name": "tokenizer_file", "val": " = None"}, {"name": "eos_token", "val": " = '</s>'"}, {"name": "unk_token", "val": " = '<unk>'"}, {"name": "pad_token", "val": " = '<pad>'"}, {"name": "extra_ids", "val": " = 100"}, {"name": "additional_special_tokens", "val": " = None"}, {"name": "add_prefix_space", "val": " = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **vocab_file** (`str`) -- | |
| [SentencePiece](https://github.com/google/sentencepiece) file (generally has a *.spm* extension) that | |
| contains the vocabulary necessary to instantiate a tokenizer. | |
| - **eos_token** (`str`, *optional*, defaults to `"</s>"`) -- | |
| The end of sequence token. | |
| <Tip> | |
| When building a sequence using special tokens, this is not the token that is used for the end of sequence. | |
| The token used is the `sep_token`. | |
| </Tip> | |
| - **unk_token** (`str`, *optional*, defaults to `"<unk>"`) -- | |
| The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this | |
| token instead. | |
| - **pad_token** (`str`, *optional*, defaults to `"<pad>"`) -- | |
| The token used for padding, for example when batching sequences of different lengths. | |
| - **extra_ids** (`int`, *optional*, defaults to 100) -- | |
| Add a number of extra ids added to the vocabulary for use as sentinels. These tokens are accessible as | |
| "<extra_id_{%d}>" where "{%d}" is a number between 0 and extra_ids-1. These tokens can be retrieved by | |
| calling get_sentinel_tokens method and token ids can be by calling get_sentinel_token_ids method | |
| - **additional_special_tokens** (`list[str]`, *optional*) -- | |
| Additional special tokens used by the tokenizer. | |
| - **add_prefix_space** (`bool`, *optional*) -- | |
| Whether or not the tokenizer should automatically add a prefix space | |
| - **from_slow** (`book`, *optional*, defaults to `False`) -- | |
| Whether or not the tokenizer should be converted from a slow one. If `add_prefix_space` is set, this will be set to `True`.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Construct a "fast" T5 tokenizer (backed by HuggingFace's *tokenizers* library). Based on | |
| [Unigram](https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=unigram#models). | |
| This tokenizer inherits from [PreTrainedTokenizerFast](/docs/transformers/pr_33892/en/main_classes/tokenizer#transformers.PreTrainedTokenizerFast) which contains most of the main methods. Users should | |
| refer to this superclass for more information regarding those methods. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>build_inputs_with_special_tokens</name><anchor>transformers.T5TokenizerFast.build_inputs_with_special_tokens</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/tokenization_t5_fast.py#L176</source><parameters>[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}]</parameters><paramsdesc>- **token_ids_0** (`list[int]`) -- | |
| List of IDs to which the special tokens will be added. | |
| - **token_ids_1** (`list[int]`, *optional*) -- | |
| Optional second list of IDs for sequence pairs.</paramsdesc><paramgroups>0</paramgroups><rettype>`list[int]`</rettype><retdesc>List of [input IDs](../glossary#input-ids) with the appropriate special tokens.</retdesc></docstring> | |
| Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and | |
| adding special tokens. A sequence has the following format: | |
| - single sequence: `X </s>` | |
| - pair of sequences: `A </s> B </s>` | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>create_token_type_ids_from_sequences</name><anchor>transformers.T5TokenizerFast.create_token_type_ids_from_sequences</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/tokenization_t5_fast.py#L202</source><parameters>[{"name": "token_ids_0", "val": ": list"}, {"name": "token_ids_1", "val": ": typing.Optional[list[int]] = None"}]</parameters><paramsdesc>- **token_ids_0** (`list[int]`) -- | |
| List of IDs. | |
| - **token_ids_1** (`list[int]`, *optional*) -- | |
| Optional second list of IDs for sequence pairs.</paramsdesc><paramgroups>0</paramgroups><rettype>`list[int]`</rettype><retdesc>List of zeros.</retdesc></docstring> | |
| Create a mask from the two sequences passed to be used in a sequence-pair classification task. T5 does not make | |
| use of token type ids, therefore a list of zeros is returned. | |
| </div></div> | |
| ## T5Model[[transformers.T5Model]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.T5Model</name><anchor>transformers.T5Model</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L971</source><parameters>[{"name": "config", "val": ": T5Config"}]</parameters><paramsdesc>- **config** ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The bare T5 Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.T5Model.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1012</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "decoder_input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "decoder_attention_mask", "val": ": typing.Optional[torch.BoolTensor] = None"}, {"name": "encoder_outputs", "val": ": typing.Optional[tuple[tuple[torch.FloatTensor]]] = None"}, {"name": "past_key_values", "val": ": typing.Optional[transformers.cache_utils.Cache] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "decoder_inputs_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "use_cache", "val": ": typing.Optional[bool] = None"}, {"name": "output_attentions", "val": ": typing.Optional[bool] = None"}, {"name": "output_hidden_states", "val": ": typing.Optional[bool] = None"}, {"name": "return_dict", "val": ": typing.Optional[bool] = None"}, {"name": "cache_position", "val": ": typing.Optional[torch.LongTensor] = None"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`) -- | |
| Indices of input sequence tokens in the vocabulary. T5 is a model with relative position embeddings so you | |
| should be able to pad the inputs on both the right and the left. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for detail. | |
| [What are input IDs?](../glossary#input-ids) | |
| To know more on how to prepare `input_ids` for pretraining take a look a [T5 Training](./t5#training). | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **decoder_input_ids** (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*) -- | |
| Indices of decoder input sequence tokens in the vocabulary. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are decoder input IDs?](../glossary#decoder-input-ids) | |
| T5 uses the `pad_token_id` as the starting token for `decoder_input_ids` generation. If `past_key_values` | |
| is used, optionally only the last `decoder_input_ids` have to be input (see `past_key_values`). | |
| To know more on how to prepare `decoder_input_ids` for pretraining take a look at [T5 | |
| Training](./t5#training). | |
| - **decoder_attention_mask** (`torch.BoolTensor` of shape `(batch_size, target_sequence_length)`, *optional*) -- | |
| Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also | |
| be used by default. | |
| - **encoder_outputs** (`tuple[tuple[torch.FloatTensor]]`, *optional*) -- | |
| Tuple consists of (`last_hidden_state`, *optional*: `hidden_states`, *optional*: `attentions`) | |
| `last_hidden_state` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) is a sequence of | |
| hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. | |
| - **past_key_values** (`~cache_utils.Cache`, *optional*) -- | |
| Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values` | |
| returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`. | |
| Only [Cache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.Cache) instance is allowed as input, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| If no `past_key_values` are passed, [DynamicCache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.DynamicCache) will be initialized by default. | |
| The model will output the same cache format that is fed as input. | |
| If `past_key_values` are used, the user is expected to input only unprocessed `input_ids` (those that don't | |
| have their past key value states given to this model) of shape `(batch_size, unprocessed_length)` instead of all `input_ids` | |
| of shape `(batch_size, sequence_length)`. | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **decoder_inputs_embeds** (`torch.Tensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded | |
| representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be | |
| input (see `past_key_values`). This is useful if you want more control over how to convert | |
| `decoder_input_ids` indices into associated vectors than the model's internal embedding lookup matrix. | |
| If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value | |
| of `inputs_embeds`. | |
| - **use_cache** (`bool`, *optional*) -- | |
| If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see | |
| `past_key_values`). | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple. | |
| - **cache_position** (`torch.LongTensor` of shape `(sequence_length)`, *optional*) -- | |
| Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`, | |
| this tensor is not affected by padding. It is used to update the cache in the correct position and to infer | |
| the complete sequence length.</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.Seq2SeqModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.Seq2SeqModelOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.Seq2SeqModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.Seq2SeqModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) and inputs. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`) -- Sequence of hidden-states at the output of the last layer of the decoder of the model. | |
| If `past_key_values` is used only the last hidden-state of the sequences of shape `(batch_size, 1, | |
| hidden_size)` is output. | |
| - **past_key_values** (`EncoderDecoderCache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`) -- It is a [EncoderDecoderCache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.EncoderDecoderCache) instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. | |
| - **decoder_hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs. | |
| - **decoder_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the | |
| self-attention heads. | |
| - **cross_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the | |
| weighted average in the cross-attention heads. | |
| - **encoder_last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- Sequence of hidden-states at the output of the last layer of the encoder of the model. | |
| - **encoder_hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs. | |
| - **encoder_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the | |
| self-attention heads.</retdesc></docstring> | |
| The [T5Model](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Model) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.T5Model.forward.example"> | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, T5Model | |
| >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small") | |
| >>> model = T5Model.from_pretrained("google-t5/t5-small") | |
| >>> input_ids = tokenizer( | |
| ... "Studies have been shown that owning a dog is good for you", return_tensors="pt" | |
| ... ).input_ids # Batch size 1 | |
| >>> decoder_input_ids = tokenizer("Studies show that", return_tensors="pt").input_ids # Batch size 1 | |
| >>> # preprocess: Prepend decoder_input_ids with start token which is pad token for T5Model. | |
| >>> # This is not needed for torch's T5ForConditionalGeneration as it does this internally using labels arg. | |
| >>> decoder_input_ids = model._shift_right(decoder_input_ids) | |
| >>> # forward pass | |
| >>> outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids) | |
| >>> last_hidden_states = outputs.last_hidden_state | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## T5ForConditionalGeneration[[transformers.T5ForConditionalGeneration]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.T5ForConditionalGeneration</name><anchor>transformers.T5ForConditionalGeneration</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1135</source><parameters>[{"name": "config", "val": ": T5Config"}]</parameters><paramsdesc>- **config** ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| T5 Model with a `language modeling` head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.T5ForConditionalGeneration.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1180</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "decoder_input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "decoder_attention_mask", "val": ": typing.Optional[torch.BoolTensor] = None"}, {"name": "encoder_outputs", "val": ": typing.Optional[tuple[tuple[torch.Tensor]]] = None"}, {"name": "past_key_values", "val": ": typing.Optional[transformers.cache_utils.Cache] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "decoder_inputs_embeds", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "labels", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "use_cache", "val": ": typing.Optional[bool] = None"}, {"name": "output_attentions", "val": ": typing.Optional[bool] = None"}, {"name": "output_hidden_states", "val": ": typing.Optional[bool] = None"}, {"name": "return_dict", "val": ": typing.Optional[bool] = None"}, {"name": "cache_position", "val": ": typing.Optional[torch.LongTensor] = None"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`) -- | |
| Indices of input sequence tokens in the vocabulary. T5 is a model with relative position embeddings so you | |
| should be able to pad the inputs on both the right and the left. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for detail. | |
| [What are input IDs?](../glossary#input-ids) | |
| To know more on how to prepare `input_ids` for pretraining take a look a [T5 Training](./t5#training). | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **decoder_input_ids** (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*) -- | |
| Indices of decoder input sequence tokens in the vocabulary. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are decoder input IDs?](../glossary#decoder-input-ids) | |
| T5 uses the `pad_token_id` as the starting token for `decoder_input_ids` generation. If `past_key_values` | |
| is used, optionally only the last `decoder_input_ids` have to be input (see `past_key_values`). | |
| To know more on how to prepare `decoder_input_ids` for pretraining take a look at [T5 | |
| Training](./t5#training). | |
| - **decoder_attention_mask** (`torch.BoolTensor` of shape `(batch_size, target_sequence_length)`, *optional*) -- | |
| Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also | |
| be used by default. | |
| - **encoder_outputs** (`tuple[tuple[torch.Tensor]]`, *optional*) -- | |
| Tuple consists of (`last_hidden_state`, *optional*: `hidden_states`, *optional*: `attentions`) | |
| `last_hidden_state` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) is a sequence of | |
| hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. | |
| - **past_key_values** (`~cache_utils.Cache`, *optional*) -- | |
| Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values` | |
| returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`. | |
| Only [Cache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.Cache) instance is allowed as input, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| If no `past_key_values` are passed, [DynamicCache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.DynamicCache) will be initialized by default. | |
| The model will output the same cache format that is fed as input. | |
| If `past_key_values` are used, the user is expected to input only unprocessed `input_ids` (those that don't | |
| have their past key value states given to this model) of shape `(batch_size, unprocessed_length)` instead of all `input_ids` | |
| of shape `(batch_size, sequence_length)`. | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **decoder_inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded | |
| representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be | |
| input (see `past_key_values`). This is useful if you want more control over how to convert | |
| `decoder_input_ids` indices into associated vectors than the model's internal embedding lookup matrix. | |
| If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value | |
| of `inputs_embeds`. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the sequence classification/regression loss. Indices should be in `[-100, 0, ..., | |
| config.vocab_size - 1]`. All labels set to `-100` are ignored (masked), the loss is only computed for | |
| labels in `[0, ..., config.vocab_size]` | |
| - **use_cache** (`bool`, *optional*) -- | |
| If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see | |
| `past_key_values`). | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple. | |
| - **cache_position** (`torch.LongTensor` of shape `(sequence_length)`, *optional*) -- | |
| Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`, | |
| this tensor is not affected by padding. It is used to update the cache in the correct position and to infer | |
| the complete sequence length.</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.Seq2SeqLMOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.Seq2SeqLMOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.Seq2SeqLMOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.Seq2SeqLMOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Language modeling loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| - **past_key_values** (`EncoderDecoderCache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`) -- It is a [EncoderDecoderCache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.EncoderDecoderCache) instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. | |
| - **decoder_hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the decoder at the output of each layer plus the initial embedding outputs. | |
| - **decoder_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the | |
| self-attention heads. | |
| - **cross_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the | |
| weighted average in the cross-attention heads. | |
| - **encoder_last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- Sequence of hidden-states at the output of the last layer of the encoder of the model. | |
| - **encoder_hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the encoder at the output of each layer plus the initial embedding outputs. | |
| - **encoder_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the | |
| self-attention heads.</retdesc></docstring> | |
| The [T5ForConditionalGeneration](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5ForConditionalGeneration) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.T5ForConditionalGeneration.forward.example"> | |
| Examples: | |
| ```python | |
| >>> from transformers import AutoTokenizer, T5ForConditionalGeneration | |
| >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small") | |
| >>> model = T5ForConditionalGeneration.from_pretrained("google-t5/t5-small") | |
| >>> # training | |
| >>> input_ids = tokenizer("The <extra_id_0> walks in <extra_id_1> park", return_tensors="pt").input_ids | |
| >>> labels = tokenizer("<extra_id_0> cute dog <extra_id_1> the <extra_id_2>", return_tensors="pt").input_ids | |
| >>> outputs = model(input_ids=input_ids, labels=labels) | |
| >>> loss = outputs.loss | |
| >>> logits = outputs.logits | |
| >>> # inference | |
| >>> input_ids = tokenizer( | |
| ... "summarize: studies have shown that owning a dog is good for you", return_tensors="pt" | |
| ... ).input_ids # Batch size 1 | |
| >>> outputs = model.generate(input_ids) | |
| >>> print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| >>> # studies have shown that owning a dog is good for you. | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## T5EncoderModel[[transformers.T5EncoderModel]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.T5EncoderModel</name><anchor>transformers.T5EncoderModel</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1330</source><parameters>[{"name": "config", "val": ": T5Config"}]</parameters><paramsdesc>- **config** ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The bare T5 Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.T5EncoderModel.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1360</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "output_attentions", "val": ": typing.Optional[bool] = None"}, {"name": "output_hidden_states", "val": ": typing.Optional[bool] = None"}, {"name": "return_dict", "val": ": typing.Optional[bool] = None"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`) -- | |
| Indices of input sequence tokens in the vocabulary. T5 is a model with relative position embeddings so you | |
| should be able to pad the inputs on both the right and the left. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for detail. | |
| To know more on how to prepare `input_ids` for pretraining take a look a [T5 Training](./t5#training). | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.BaseModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.BaseModelOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.BaseModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.BaseModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) and inputs. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [T5EncoderModel](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5EncoderModel) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.T5EncoderModel.forward.example"> | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, T5EncoderModel | |
| >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small") | |
| >>> model = T5EncoderModel.from_pretrained("google-t5/t5-small") | |
| >>> input_ids = tokenizer( | |
| ... "Studies have been shown that owning a dog is good for you", return_tensors="pt" | |
| ... ).input_ids # Batch size 1 | |
| >>> outputs = model(input_ids=input_ids) | |
| >>> last_hidden_states = outputs.last_hidden_state | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## T5ForSequenceClassification[[transformers.T5ForSequenceClassification]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.T5ForSequenceClassification</name><anchor>transformers.T5ForSequenceClassification</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1413</source><parameters>[{"name": "config", "val": ": T5Config"}]</parameters><paramsdesc>- **config** ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| T5 model with a sequence classification/head on top (a linear layer on top of the pooled output) e.g. for GLUE | |
| tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.T5ForSequenceClassification.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1425</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "decoder_input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "decoder_attention_mask", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "encoder_outputs", "val": ": typing.Optional[list[torch.FloatTensor]] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "decoder_inputs_embeds", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "labels", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "use_cache", "val": ": typing.Optional[bool] = None"}, {"name": "output_attentions", "val": ": typing.Optional[bool] = None"}, {"name": "output_hidden_states", "val": ": typing.Optional[bool] = None"}, {"name": "return_dict", "val": ": typing.Optional[bool] = None"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`) -- | |
| Indices of input sequence tokens in the vocabulary. T5 is a model with relative position embeddings so you | |
| should be able to pad the inputs on both the right and the left. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for detail. | |
| [What are input IDs?](../glossary#input-ids) | |
| To know more on how to prepare `input_ids` for pretraining take a look a [T5 Training](./t5#training). | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **decoder_input_ids** (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*) -- | |
| Indices of decoder input sequence tokens in the vocabulary. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are decoder input IDs?](../glossary#decoder-input-ids) | |
| T5 uses the `pad_token_id` as the starting token for `decoder_input_ids` generation. If `past_key_values` | |
| is used, optionally only the last `decoder_input_ids` have to be input (see `past_key_values`). | |
| To know more on how to prepare `decoder_input_ids` for pretraining take a look at [T5 | |
| Training](./t5#training). | |
| - **decoder_attention_mask** (`torch.BoolTensor` of shape `(batch_size, target_sequence_length)`, *optional*) -- | |
| Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also | |
| be used by default. | |
| - **encoder_outputs** (`list[torch.FloatTensor]`, *optional*) -- | |
| Tuple consists of (`last_hidden_state`, *optional*: `hidden_states`, *optional*: `attentions`) | |
| `last_hidden_state` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) is a sequence of | |
| hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **decoder_inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded | |
| representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be | |
| input (see `past_key_values`). This is useful if you want more control over how to convert | |
| `decoder_input_ids` indices into associated vectors than the model's internal embedding lookup matrix. | |
| If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value | |
| of `inputs_embeds`. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., | |
| config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy). | |
| - **use_cache** (`bool`, *optional*) -- | |
| If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see | |
| `past_key_values`). | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `label` is provided) -- Classification (or regression if config.num_labels==1) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, config.num_labels)`) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). | |
| - **past_key_values** (`EncoderDecoderCache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`) -- It is a [EncoderDecoderCache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.EncoderDecoderCache) instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. | |
| - **decoder_hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the decoder at the output of each layer plus the initial embedding outputs. | |
| - **decoder_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the | |
| self-attention heads. | |
| - **cross_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the | |
| weighted average in the cross-attention heads. | |
| - **encoder_last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- Sequence of hidden-states at the output of the last layer of the encoder of the model. | |
| - **encoder_hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the encoder at the output of each layer plus the initial embedding outputs. | |
| - **encoder_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the | |
| self-attention heads.</retdesc></docstring> | |
| The [T5ForSequenceClassification](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5ForSequenceClassification) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.T5ForSequenceClassification.forward.example"> | |
| Example of single-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, T5ForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small") | |
| >>> model = T5ForSequenceClassification.from_pretrained("google-t5/t5-small") | |
| >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_id = logits.argmax().item() | |
| >>> model.config.id2label[predicted_class_id] | |
| ... | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = T5ForSequenceClassification.from_pretrained("google-t5/t5-small", num_labels=num_labels) | |
| >>> labels = torch.tensor([1]) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| </ExampleCodeBlock> | |
| <ExampleCodeBlock anchor="transformers.T5ForSequenceClassification.forward.example-2"> | |
| Example of multi-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, T5ForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small") | |
| >>> model = T5ForSequenceClassification.from_pretrained("google-t5/t5-small", problem_type="multi_label_classification") | |
| >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5] | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = T5ForSequenceClassification.from_pretrained( | |
| ... "google-t5/t5-small", num_labels=num_labels, problem_type="multi_label_classification" | |
| ... ) | |
| >>> labels = torch.sum( | |
| ... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1 | |
| ... ).to(torch.float) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## T5ForTokenClassification[[transformers.T5ForTokenClassification]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.T5ForTokenClassification</name><anchor>transformers.T5ForTokenClassification</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1556</source><parameters>[{"name": "config", "val": ": T5Config"}]</parameters><paramsdesc>- **config** ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The T5 transformer with a token classification head on top (a linear layer on top of the hidden-states | |
| output) e.g. for Named-Entity-Recognition (NER) tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.T5ForTokenClassification.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1570</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "labels", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "output_attentions", "val": ": typing.Optional[bool] = None"}, {"name": "output_hidden_states", "val": ": typing.Optional[bool] = None"}, {"name": "return_dict", "val": ": typing.Optional[bool] = None"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`) -- | |
| Indices of input sequence tokens in the vocabulary. T5 is a model with relative position embeddings so you | |
| should be able to pad the inputs on both the right and the left. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for detail. | |
| [What are input IDs?](../glossary#input-ids) | |
| To know more on how to prepare `input_ids` for pretraining take a look a [T5 Training](./t5#training). | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`. | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.TokenClassifierOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.TokenClassifierOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.num_labels)`) -- Classification scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads.</retdesc></docstring> | |
| The [T5ForTokenClassification](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5ForTokenClassification) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.T5ForTokenClassification.forward.example"> | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, T5ForTokenClassification | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small") | |
| >>> model = T5ForTokenClassification.from_pretrained("google-t5/t5-small") | |
| >>> inputs = tokenizer( | |
| ... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt" | |
| ... ) | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_token_class_ids = logits.argmax(-1) | |
| >>> # Note that tokens are classified rather then input words which means that | |
| >>> # there might be more predicted token classes than words. | |
| >>> # Multiple token classes might account for the same word | |
| >>> predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]] | |
| >>> predicted_tokens_classes | |
| ... | |
| >>> labels = predicted_token_class_ids | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## T5ForQuestionAnswering[[transformers.T5ForQuestionAnswering]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.T5ForQuestionAnswering</name><anchor>transformers.T5ForQuestionAnswering</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1628</source><parameters>[{"name": "config", "val": ": T5Config"}]</parameters><paramsdesc>- **config** ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The T5 transformer with a span classification head on top for extractive question-answering tasks like | |
| SQuAD (a linear layer on top of the hidden-states output to compute `span start logits` and `span end logits`). | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.T5ForQuestionAnswering.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/t5/modeling_t5.py#L1672</source><parameters>[{"name": "input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "attention_mask", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "decoder_input_ids", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "decoder_attention_mask", "val": ": typing.Optional[torch.BoolTensor] = None"}, {"name": "encoder_outputs", "val": ": typing.Optional[tuple[tuple[torch.Tensor]]] = None"}, {"name": "start_positions", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "end_positions", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "inputs_embeds", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "decoder_inputs_embeds", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "use_cache", "val": ": typing.Optional[bool] = None"}, {"name": "output_attentions", "val": ": typing.Optional[bool] = None"}, {"name": "output_hidden_states", "val": ": typing.Optional[bool] = None"}, {"name": "return_dict", "val": ": typing.Optional[bool] = None"}]</parameters><paramsdesc>- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`) -- | |
| Indices of input sequence tokens in the vocabulary. T5 is a model with relative position embeddings so you | |
| should be able to pad the inputs on both the right and the left. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for detail. | |
| [What are input IDs?](../glossary#input-ids) | |
| To know more on how to prepare `input_ids` for pretraining take a look a [T5 Training](./t5#training). | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **decoder_input_ids** (`torch.LongTensor` of shape `(batch_size, target_sequence_length)`, *optional*) -- | |
| Indices of decoder input sequence tokens in the vocabulary. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/pr_33892/en/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/pr_33892/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are decoder input IDs?](../glossary#decoder-input-ids) | |
| T5 uses the `pad_token_id` as the starting token for `decoder_input_ids` generation. If `past_key_values` | |
| is used, optionally only the last `decoder_input_ids` have to be input (see `past_key_values`). | |
| To know more on how to prepare `decoder_input_ids` for pretraining take a look at [T5 | |
| Training](./t5#training). | |
| - **decoder_attention_mask** (`torch.BoolTensor` of shape `(batch_size, target_sequence_length)`, *optional*) -- | |
| Default behavior: generate a tensor that ignores pad tokens in `decoder_input_ids`. Causal mask will also | |
| be used by default. | |
| - **encoder_outputs** (`tuple[tuple[torch.Tensor]]`, *optional*) -- | |
| Tuple consists of (`last_hidden_state`, *optional*: `hidden_states`, *optional*: `attentions`) | |
| `last_hidden_state` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) is a sequence of | |
| hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder. | |
| - **start_positions** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for position (index) of the start of the labelled span for computing the token classification loss. | |
| Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence | |
| are not taken into account for computing the loss. | |
| - **end_positions** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for position (index) of the end of the labelled span for computing the token classification loss. | |
| Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence | |
| are not taken into account for computing the loss. | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **decoder_inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, target_sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `decoder_input_ids` you can choose to directly pass an embedded | |
| representation. If `past_key_values` is used, optionally only the last `decoder_inputs_embeds` have to be | |
| input (see `past_key_values`). This is useful if you want more control over how to convert | |
| `decoder_input_ids` indices into associated vectors than the model's internal embedding lookup matrix. | |
| If `decoder_input_ids` and `decoder_inputs_embeds` are both unset, `decoder_inputs_embeds` takes the value | |
| of `inputs_embeds`. | |
| - **use_cache** (`bool`, *optional*) -- | |
| If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see | |
| `past_key_values`). | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([T5Config](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5Config)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Total span extraction loss is the sum of a Cross-Entropy for the start and end positions. | |
| - **start_logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`) -- Span-start scores (before SoftMax). | |
| - **end_logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`) -- Span-end scores (before SoftMax). | |
| - **past_key_values** (`EncoderDecoderCache`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`) -- It is a [EncoderDecoderCache](/docs/transformers/pr_33892/en/internal/generation_utils#transformers.EncoderDecoderCache) instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache). | |
| Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention | |
| blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. | |
| - **decoder_hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the decoder at the output of each layer plus the initial embedding outputs. | |
| - **decoder_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the | |
| self-attention heads. | |
| - **cross_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the | |
| weighted average in the cross-attention heads. | |
| - **encoder_last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- Sequence of hidden-states at the output of the last layer of the encoder of the model. | |
| - **encoder_hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the encoder at the output of each layer plus the initial embedding outputs. | |
| - **encoder_attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the | |
| self-attention heads.</retdesc></docstring> | |
| The [T5ForQuestionAnswering](/docs/transformers/pr_33892/en/model_doc/t5#transformers.T5ForQuestionAnswering) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.T5ForQuestionAnswering.forward.example"> | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, T5ForQuestionAnswering | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("google-t5/t5-small") | |
| >>> model = T5ForQuestionAnswering.from_pretrained("google-t5/t5-small") | |
| >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" | |
| >>> inputs = tokenizer(question, text, return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... outputs = model(**inputs) | |
| >>> answer_start_index = outputs.start_logits.argmax() | |
| >>> answer_end_index = outputs.end_logits.argmax() | |
| >>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] | |
| >>> tokenizer.decode(predict_answer_tokens, skip_special_tokens=True) | |
| ... | |
| >>> # target is "nice puppet" | |
| >>> target_start_index = torch.tensor([14]) | |
| >>> target_end_index = torch.tensor([15]) | |
| >>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index) | |
| >>> loss = outputs.loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/en/model_doc/t5.md" /> |
Xet Storage Details
- Size:
- 94.3 kB
- Xet hash:
- 8a3fd181d7c69ae26d47c9ca879f6a680bd1144376659306164ca0404c505022
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.