Utilities for Generation
This page lists all the utility functions used by [~generation.GenerationMixin.generate].
Generate Outputs
The output of [~generation.GenerationMixin.generate] is an instance of a subclass of
[~utils.ModelOutput]. This output is a data structure containing all the information returned
by [~generation.GenerationMixin.generate], but that can also be used as tuple or dictionary.
Here's an example:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2")
model = GPT2LMHeadModel.from_pretrained("openai-community/gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The generation_output object is a [~generation.GenerateDecoderOnlyOutput], as we can
see in the documentation of that class below, it means it has the following attributes:
sequences: the generated sequences of tokensscores(optional): the prediction scores of the language modelling head, for each generation stephidden_states(optional): the hidden states of the model, for each generation stepattentions(optional): the attention weights of the model, for each generation step
Here we have the scores since we passed along output_scores=True, but we don't have hidden_states and
attentions because we didn't pass output_hidden_states=True or output_attentions=True.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get None. Here for instance generation_output.scores are all the generated prediction scores of the
language modeling head, and generation_output.attentions is None.
When using our generation_output object as a tuple, it only keeps the attributes that don't have None values.
Here, for instance, it has two elements, loss then logits, so
generation_output[:2]
will return the tuple (generation_output.sequences, generation_output.scores) for instance.
When using our generation_output object as a dictionary, it only keeps the attributes that don't have None
values. Here, for instance, it has two keys that are sequences and scores.
We document here all output types.
[[autodoc]] generation.GenerateDecoderOnlyOutput
[[autodoc]] generation.GenerateEncoderDecoderOutput
[[autodoc]] generation.GenerateBeamDecoderOnlyOutput
[[autodoc]] generation.GenerateBeamEncoderDecoderOutput
LogitsProcessor
A [LogitsProcessor] can be used to modify the prediction scores of a language model head for
generation.
[[autodoc]] AlternatingCodebooksLogitsProcessor - call
[[autodoc]] ClassifierFreeGuidanceLogitsProcessor - call
[[autodoc]] EncoderNoRepeatNGramLogitsProcessor - call
[[autodoc]] EncoderRepetitionPenaltyLogitsProcessor - call
[[autodoc]] EpsilonLogitsWarper - call
[[autodoc]] EtaLogitsWarper - call
[[autodoc]] ExponentialDecayLengthPenalty - call
[[autodoc]] ForcedBOSTokenLogitsProcessor - call
[[autodoc]] ForcedEOSTokenLogitsProcessor - call
[[autodoc]] InfNanRemoveLogitsProcessor - call
[[autodoc]] LogitNormalization - call
[[autodoc]] LogitsProcessor - call
[[autodoc]] LogitsProcessorList - call
[[autodoc]] MinLengthLogitsProcessor - call
[[autodoc]] MinNewTokensLengthLogitsProcessor - call
[[autodoc]] MinPLogitsWarper - call
[[autodoc]] NoBadWordsLogitsProcessor - call
[[autodoc]] NoRepeatNGramLogitsProcessor - call
[[autodoc]] PrefixConstrainedLogitsProcessor - call
[[autodoc]] RepetitionPenaltyLogitsProcessor - call
[[autodoc]] SequenceBiasLogitsProcessor - call
[[autodoc]] SuppressTokensAtBeginLogitsProcessor - call
[[autodoc]] SuppressTokensLogitsProcessor - call
[[autodoc]] SynthIDTextWatermarkLogitsProcessor - call
[[autodoc]] TemperatureLogitsWarper - call
[[autodoc]] TopHLogitsWarper - call
[[autodoc]] TopKLogitsWarper - call
[[autodoc]] TopPLogitsWarper - call
[[autodoc]] TypicalLogitsWarper - call
[[autodoc]] UnbatchedClassifierFreeGuidanceLogitsProcessor - call
[[autodoc]] WhisperTimeStampLogitsProcessor - call
[[autodoc]] WatermarkLogitsProcessor - call
StoppingCriteria
A [StoppingCriteria] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations.
[[autodoc]] StoppingCriteria - call
[[autodoc]] StoppingCriteriaList - call
[[autodoc]] MaxLengthCriteria - call
[[autodoc]] MaxTimeCriteria - call
[[autodoc]] StopStringCriteria - call
[[autodoc]] EosTokenCriteria - call
Streamers
[[autodoc]] TextStreamer
[[autodoc]] TextIteratorStreamer
[[autodoc]] AsyncTextIteratorStreamer
Caches
[[autodoc]] CacheLayerMixin - update - get_seq_length - get_mask_sizes - get_max_cache_shape - reset - reorder_cache - lazy_initialization
[[autodoc]] DynamicLayer - update - lazy_initialization - crop - batch_repeat_interleave - batch_select_indices
[[autodoc]] StaticLayer - update - lazy_initialization
[[autodoc]] StaticSlidingWindowLayer - update - lazy_initialization
[[autodoc]] QuantoQuantizedLayer - update - lazy_initialization
[[autodoc]] HQQQuantizedLayer - update - lazy_initialization
[[autodoc]] Cache - update - early_initialization - get_seq_length - get_mask_sizes - get_max_cache_shape - reset - reorder_cache - crop - batch_repeat_interleave - batch_select_indices
[[autodoc]] DynamicCache
[[autodoc]] StaticCache
[[autodoc]] QuantizedCache
[[autodoc]] EncoderDecoderCache
Watermark Utils
[[autodoc]] WatermarkingConfig - call
[[autodoc]] WatermarkDetector - call
[[autodoc]] BayesianDetectorConfig
[[autodoc]] BayesianDetectorModel - forward
[[autodoc]] SynthIDTextWatermarkingConfig
[[autodoc]] SynthIDTextWatermarkDetector - call
Compile Utils
[[autodoc]] CompileConfig - call