Buckets:
Model outputs
All models have outputs that are instances of subclasses of ModelOutput. Those are data structures containing all the information returned by the model, but that can also be used as tuples or dictionaries.
Let's see how this looks in an example:
from transformers import BertTokenizer, BertForSequenceClassification
import torch
tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("google-bert/bert-base-uncased")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
The outputs object is a SequenceClassifierOutput, as we can see in the
documentation of that class below, it means it has an optional loss, a logits, an optional hidden_states and
an optional attentions attribute. Here we have the loss since we passed along labels, but we don't have
hidden_states and attentions because we didn't pass output_hidden_states=True or
output_attentions=True.
When passing output_hidden_states=True you may expect the outputs.hidden_states[-1] to match outputs.last_hidden_state exactly.
However, this is not always the case. Some models apply normalization or subsequent process to the last hidden state when it's returned.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get None. Here for instance outputs.loss is the loss computed by the model, and outputs.attentions is
None.
When considering our outputs object as tuple, it only considers the attributes that don't have None values.
Here for instance, it has two elements, loss then logits, so
outputs[:2]
will return the tuple (outputs.loss, outputs.logits) for instance.
When considering our outputs object as dictionary, it only considers the attributes that don't have None
values. Here for instance, it has two keys that are loss and logits.
We document here the generic model outputs that are used by more than one model type. Specific output types are documented on their corresponding model page.
ModelOutput[[transformers.utils.ModelOutput]]
class transformers.utils.ModelOutputtransformers.utils.ModelOutput
Base class for all model outputs as dataclass. Has a __getitem__ that allows indexing by integer or slice (like a
tuple) or strings (like a dictionary) that will ignore the None attributes. Otherwise behaves like a regular
python dictionary.
You can't unpack a ModelOutput directly. Use the to_tuple() method to convert it to a tuple
before.
to_tupletransformers.utils.ModelOutput.to_tuple
Convert self to a tuple containing all the attributes/keys that are not None.
BaseModelOutput[[transformers.modeling_outputs.BaseModelOutput]]
class transformers.modeling_outputs.BaseModelOutputtransformers.modeling_outputs.BaseModelOutputtorch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) --
Sequence of hidden-states at the output of the last layer of the model.
hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for model's outputs, with potential hidden states and attentions.
BaseModelOutputWithPooling[[transformers.modeling_outputs.BaseModelOutputWithPooling]]
class transformers.modeling_outputs.BaseModelOutputWithPoolingtransformers.modeling_outputs.BaseModelOutputWithPoolingtorch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) --
Sequence of hidden-states at the output of the last layer of the model.
pooler_output (
torch.FloatTensorof shape(batch_size, hidden_size)) -- Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for model's outputs that also contains a pooling of the last hidden states.
BaseModelOutputWithCrossAttentions[[transformers.modeling_outputs.BaseModelOutputWithCrossAttentions]]
class transformers.modeling_outputs.BaseModelOutputWithCrossAttentionstransformers.modeling_outputs.BaseModelOutputWithCrossAttentionstorch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) --
Sequence of hidden-states at the output of the last layer of the model.
hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueandconfig.add_cross_attention=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.0
Base class for model's outputs, with potential hidden states and attentions.
BaseModelOutputWithPoolingAndCrossAttentions[[transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions]]
class transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentionstransformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentionstorch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) --
Sequence of hidden-states at the output of the last layer of the model.
pooler_output (
torch.FloatTensorof shape(batch_size, hidden_size)) -- Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns the classification token after processing through a linear layer and a tanh activation function. The linear layer weights are trained from the next sentence prediction (classification) objective during pretraining.hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueandconfig.add_cross_attention=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
past_key_values (
Cache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a Cache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if
config.is_encoder_decoder=Truein the cross-attention blocks) that can be used (seepast_key_valuesinput) to speed up sequential decoding.0
Base class for model's outputs that also contains a pooling of the last hidden states.
BaseModelOutputWithPast[[transformers.modeling_outputs.BaseModelOutputWithPast]]
class transformers.modeling_outputs.BaseModelOutputWithPasttransformers.modeling_outputs.BaseModelOutputWithPasttorch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) --
Sequence of hidden-states at the output of the last layer of the model.
If past_key_values is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size) is output.
past_key_values (
Cache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a Cache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if
config.is_encoder_decoder=Truein the cross-attention blocks) that can be used (seepast_key_valuesinput) to speed up sequential decoding.hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for model's outputs that may also contain a past key/values (to speed up sequential decoding).
BaseModelOutputWithPastAndCrossAttentions[[transformers.modeling_outputs.BaseModelOutputWithPastAndCrossAttentions]]
class transformers.modeling_outputs.BaseModelOutputWithPastAndCrossAttentionstransformers.modeling_outputs.BaseModelOutputWithPastAndCrossAttentionstorch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) --
Sequence of hidden-states at the output of the last layer of the model.
If past_key_values is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size) is output.
past_key_values (
Cache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a Cache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and optionally if
config.is_encoder_decoder=Truein the cross-attention blocks) that can be used (seepast_key_valuesinput) to speed up sequential decoding.hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueandconfig.add_cross_attention=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.0
Base class for model's outputs that may also contain a past key/values (to speed up sequential decoding).
Seq2SeqModelOutput[[transformers.modeling_outputs.Seq2SeqModelOutput]]
class transformers.modeling_outputs.Seq2SeqModelOutputtransformers.modeling_outputs.Seq2SeqModelOutputtorch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) --
Sequence of hidden-states at the output of the last layer of the decoder of the model.
If past_key_values is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size) is output.
past_key_values (
EncoderDecoderCache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a EncoderDecoderCache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for model encoder's outputs that also contains : pre-computed hidden states that can speed up sequential decoding.
CausalLMOutput[[transformers.modeling_outputs.CausalLMOutput]]
class transformers.modeling_outputs.CausalLMOutputtransformers.modeling_outputs.CausalLMOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Language modeling loss (for next-token prediction).
logits (
torch.FloatTensorof shape(batch_size, sequence_length, config.vocab_size)) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for causal language model (or autoregressive) outputs.
CausalLMOutputWithCrossAttentions[[transformers.modeling_outputs.CausalLMOutputWithCrossAttentions]]
class transformers.modeling_outputs.CausalLMOutputWithCrossAttentionstransformers.modeling_outputs.CausalLMOutputWithCrossAttentionstorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Language modeling loss (for next-token prediction).
logits (
torch.FloatTensorof shape(batch_size, sequence_length, config.vocab_size)) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Cross attentions weights after the attention softmax, used to compute the weighted average in the cross-attention heads.
past_key_values (
Cache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a Cache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see
past_key_valuesinput) to speed up sequential decoding.0
Base class for causal language model (or autoregressive) outputs.
CausalLMOutputWithPast[[transformers.modeling_outputs.CausalLMOutputWithPast]]
class transformers.modeling_outputs.CausalLMOutputWithPasttransformers.modeling_outputs.CausalLMOutputWithPasttorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Language modeling loss (for next-token prediction).
logits (
torch.FloatTensorof shape(batch_size, sequence_length, config.vocab_size)) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).past_key_values (
Cache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a Cache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see
past_key_valuesinput) to speed up sequential decoding.hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for causal language model (or autoregressive) outputs.
MaskedLMOutput[[transformers.modeling_outputs.MaskedLMOutput]]
class transformers.modeling_outputs.MaskedLMOutputtransformers.modeling_outputs.MaskedLMOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Masked language modeling (MLM) loss.
logits (
torch.FloatTensorof shape(batch_size, sequence_length, config.vocab_size)) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for masked language models outputs.
Seq2SeqLMOutput[[transformers.modeling_outputs.Seq2SeqLMOutput]]
class transformers.modeling_outputs.Seq2SeqLMOutputtransformers.modeling_outputs.Seq2SeqLMOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Language modeling loss.
logits (
torch.FloatTensorof shape(batch_size, sequence_length, config.vocab_size)) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).past_key_values (
EncoderDecoderCache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a EncoderDecoderCache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for sequence-to-sequence language models outputs.
NextSentencePredictorOutput[[transformers.modeling_outputs.NextSentencePredictorOutput]]
class transformers.modeling_outputs.NextSentencePredictorOutputtransformers.modeling_outputs.NextSentencePredictorOutputtorch.FloatTensor of shape (1,), optional, returned when next_sentence_label is provided) --
Next sequence prediction (classification) loss.
logits (
torch.FloatTensorof shape(batch_size, 2)) -- Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax).hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of models predicting if two sentences are consecutive or not.
SequenceClassifierOutput[[transformers.modeling_outputs.SequenceClassifierOutput]]
class transformers.modeling_outputs.SequenceClassifierOutputtransformers.modeling_outputs.SequenceClassifierOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Classification (or regression if config.num_labels==1) loss.
logits (
torch.FloatTensorof shape(batch_size, config.num_labels)) -- Classification (or regression if config.num_labels==1) scores (before SoftMax).hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of sentence classification models.
Seq2SeqSequenceClassifierOutput[[transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput]]
class transformers.modeling_outputs.Seq2SeqSequenceClassifierOutputtransformers.modeling_outputs.Seq2SeqSequenceClassifierOutputtorch.FloatTensor of shape (1,), optional, returned when label is provided) --
Classification (or regression if config.num_labels==1) loss.
logits (
torch.FloatTensorof shape(batch_size, config.num_labels)) -- Classification (or regression if config.num_labels==1) scores (before SoftMax).past_key_values (
EncoderDecoderCache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a EncoderDecoderCache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of sequence-to-sequence sentence classification models.
MultipleChoiceModelOutput[[transformers.modeling_outputs.MultipleChoiceModelOutput]]
class transformers.modeling_outputs.MultipleChoiceModelOutputtransformers.modeling_outputs.MultipleChoiceModelOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Classification loss.
logits (
torch.FloatTensorof shape(batch_size, num_choices)) -- num_choices is the second dimension of the input tensors. (see input_ids above).Classification scores (before SoftMax).
hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of multiple choice models.
TokenClassifierOutput[[transformers.modeling_outputs.TokenClassifierOutput]]
class transformers.modeling_outputs.TokenClassifierOutputtransformers.modeling_outputs.TokenClassifierOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Classification loss.
logits (
torch.FloatTensorof shape(batch_size, sequence_length, config.num_labels)) -- Classification scores (before SoftMax).hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of token classification models.
QuestionAnsweringModelOutput[[transformers.modeling_outputs.QuestionAnsweringModelOutput]]
class transformers.modeling_outputs.QuestionAnsweringModelOutputtransformers.modeling_outputs.QuestionAnsweringModelOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.
start_logits (
torch.FloatTensorof shape(batch_size, sequence_length)) -- Span-start scores (before SoftMax).end_logits (
torch.FloatTensorof shape(batch_size, sequence_length)) -- Span-end scores (before SoftMax).hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of question answering models.
Seq2SeqQuestionAnsweringModelOutput[[transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput]]
class transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutputtransformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.
start_logits (
torch.FloatTensorof shape(batch_size, sequence_length)) -- Span-start scores (before SoftMax).end_logits (
torch.FloatTensorof shape(batch_size, sequence_length)) -- Span-end scores (before SoftMax).past_key_values (
EncoderDecoderCache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a EncoderDecoderCache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of sequence-to-sequence question answering models.
Seq2SeqSpectrogramOutput[[transformers.modeling_outputs.Seq2SeqSpectrogramOutput]]
class transformers.modeling_outputs.Seq2SeqSpectrogramOutputtransformers.modeling_outputs.Seq2SeqSpectrogramOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Spectrogram generation loss.
spectrogram (
torch.FloatTensorof shape(batch_size, sequence_length, num_bins)) -- The predicted spectrogram.past_key_values (
EncoderDecoderCache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a EncoderDecoderCache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for sequence-to-sequence spectrogram outputs.
SemanticSegmenterOutput[[transformers.modeling_outputs.SemanticSegmenterOutput]]
class transformers.modeling_outputs.SemanticSegmenterOutputtransformers.modeling_outputs.SemanticSegmenterOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Classification (or regression if config.num_labels==1) loss.
logits (
torch.FloatTensorof shape(batch_size, config.num_labels, logits_height, logits_width)) -- Classification scores for each pixel.The logits returned do not necessarily have the same size as the
pixel_valuespassed as inputs. This is to avoid doing two interpolations and lose some quality when a user needs to resize the logits to the original image size as post-processing. You should always check your logits shape and resize as needed.hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, patch_size, hidden_size).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, patch_size, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of semantic segmentation models.
ImageClassifierOutput[[transformers.modeling_outputs.ImageClassifierOutput]]
class transformers.modeling_outputs.ImageClassifierOutputtransformers.modeling_outputs.ImageClassifierOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Classification (or regression if config.num_labels==1) loss.
logits (
torch.FloatTensorof shape(batch_size, config.num_labels)) -- Classification (or regression if config.num_labels==1) scores (before SoftMax).hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each stage) of shape(batch_size, sequence_length, hidden_size). Hidden-states (also called feature maps) of the model at the output of each stage.attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, patch_size, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of image classification models.
ImageClassifierOutputWithNoAttention[[transformers.modeling_outputs.ImageClassifierOutputWithNoAttention]]
class transformers.modeling_outputs.ImageClassifierOutputWithNoAttentiontransformers.modeling_outputs.ImageClassifierOutputWithNoAttentiontorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Classification (or regression if config.num_labels==1) loss.
- logits (
torch.FloatTensorof shape(batch_size, config.num_labels)) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). - hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each stage) of shape(batch_size, num_channels, height, width). Hidden-states (also called feature maps) of the model at the output of each stage.0
Base class for outputs of image classification models.
DepthEstimatorOutput[[transformers.modeling_outputs.DepthEstimatorOutput]]
class transformers.modeling_outputs.DepthEstimatorOutputtransformers.modeling_outputs.DepthEstimatorOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Classification (or regression if config.num_labels==1) loss.
predicted_depth (
torch.FloatTensorof shape(batch_size, height, width)) -- Predicted depth for each pixel.hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, num_channels, height, width).Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, patch_size, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for outputs of depth estimation models.
Wav2Vec2BaseModelOutput[[transformers.modeling_outputs.Wav2Vec2BaseModelOutput]]
class transformers.modeling_outputs.Wav2Vec2BaseModelOutputtransformers.modeling_outputs.Wav2Vec2BaseModelOutputtorch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) --
Sequence of hidden-states at the output of the last layer of the model.
extract_features (
torch.FloatTensorof shape(batch_size, sequence_length, conv_dim[-1])) -- Sequence of extracted feature vectors of the last convolutional layer of the model.hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Base class for models that have been trained with the Wav2Vec2 loss objective.
XVectorOutput[[transformers.modeling_outputs.XVectorOutput]]
class transformers.modeling_outputs.XVectorOutputtransformers.modeling_outputs.XVectorOutputtorch.FloatTensor of shape (1,), optional, returned when labels is provided) --
Classification loss.
logits (
torch.FloatTensorof shape(batch_size, config.xvector_output_dim)) -- Classification hidden states before AMSoftmax.embeddings (
torch.FloatTensorof shape(batch_size, config.xvector_output_dim)) -- Utterance embeddings used for vector similarity-based retrieval.hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.0
Output type of Wav2Vec2ForXVector.
Seq2SeqTSModelOutput[[transformers.modeling_outputs.Seq2SeqTSModelOutput]]
class transformers.modeling_outputs.Seq2SeqTSModelOutputtransformers.modeling_outputs.Seq2SeqTSModelOutputtorch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) --
Sequence of hidden-states at the output of the last layer of the decoder of the model.
If past_key_values is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size) is output.
past_key_values (
EncoderDecoderCache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a EncoderDecoderCache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
loc (
torch.FloatTensorof shape(batch_size,)or(batch_size, input_size), optional) -- Shift values of each time series' context window which is used to give the model inputs of the same magnitude and then used to shift back to the original magnitude.scale (
torch.FloatTensorof shape(batch_size,)or(batch_size, input_size), optional) -- Scaling values of each time series' context window which is used to give the model inputs of the same magnitude and then used to rescale back to the original magnitude.static_features (
torch.FloatTensorof shape(batch_size, feature size), optional) -- Static features of each time series' in a batch which are copied to the covariates at inference time.0
Base class for time series model's encoder outputs that also contains pre-computed hidden states that can speed up sequential decoding.
Seq2SeqTSPredictionOutput[[transformers.modeling_outputs.Seq2SeqTSPredictionOutput]]
class transformers.modeling_outputs.Seq2SeqTSPredictionOutputtransformers.modeling_outputs.Seq2SeqTSPredictionOutputtorch.FloatTensor of shape (1,), optional, returned when a future_values is provided) --
Distributional loss.
params (
torch.FloatTensorof shape(batch_size, num_samples, num_params)) -- Parameters of the chosen distribution.past_key_values (
EncoderDecoderCache, optional, returned whenuse_cache=Trueis passed or whenconfig.use_cache=True) -- It is a EncoderDecoderCache instance. For more details, see our kv cache guide.Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see
past_key_valuesinput) to speed up sequential decoding.decoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the decoder at the output of each layer plus the initial embedding outputs.
decoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
cross_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.
encoder_last_hidden_state (
torch.FloatTensorof shape(batch_size, sequence_length, hidden_size), optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model.encoder_hidden_states (
tuple(torch.FloatTensor), optional, returned whenoutput_hidden_states=Trueis passed or whenconfig.output_hidden_states=True) -- Tuple oftorch.FloatTensor(one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size).Hidden-states of the encoder at the output of each layer plus the initial embedding outputs.
encoder_attentions (
tuple(torch.FloatTensor), optional, returned whenoutput_attentions=Trueis passed or whenconfig.output_attentions=True) -- Tuple oftorch.FloatTensor(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length).Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.
loc (
torch.FloatTensorof shape(batch_size,)or(batch_size, input_size), optional) -- Shift values of each time series' context window which is used to give the model inputs of the same magnitude and then used to shift back to the original magnitude.scale (
torch.FloatTensorof shape(batch_size,)or(batch_size, input_size), optional) -- Scaling values of each time series' context window which is used to give the model inputs of the same magnitude and then used to rescale back to the original magnitude.static_features (
torch.FloatTensorof shape(batch_size, feature size), optional) -- Static features of each time series' in a batch which are copied to the covariates at inference time.0
Base class for time series model's decoder outputs that also contain the loss as well as the parameters of the chosen distribution.
SampleTSPredictionOutput[[transformers.modeling_outputs.SampleTSPredictionOutput]]
class transformers.modeling_outputs.SampleTSPredictionOutputtransformers.modeling_outputs.SampleTSPredictionOutputtorch.FloatTensor of shape (batch_size, num_samples, prediction_length) or (batch_size, num_samples, prediction_length, input_size)) --
Sampled values from the chosen distribution.0
Base class for time series model's predictions outputs that contains the sampled values from the chosen distribution.
Xet Storage Details
- Size:
- 91.4 kB
- Xet hash:
- 19898d1535decdd81dc03de48e77a9089ab7c2604bc3cea4d95d0a3f159628c8
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.