Buckets:

hf-doc-build
/

doc-dev

Merge a list of tensors into a single tensor along the first dimension. We explicitly define this because for EP or TP you want to make sure you know what you are doing!

transformers.SplitModulelist[[transformers.SplitModulelist]]

Source

Inverse of MergeModulelist using explicit split sizes per group.

transformers.PermuteForRope[[transformers.PermuteForRope]]

Source

Applies the permutation required to convert complex RoPE weights to the split sin/cos format.

Layers[[transformers.GradientCheckpointingLayer]]

transformers.GradientCheckpointingLayer[[transformers.GradientCheckpointingLayer]]

Source

Base class for layers with gradient checkpointing.

This class enables gradient checkpointing functionality for a layer. By default, gradient checkpointing is disabled (gradient_checkpointing = False). When model.set_gradient_checkpointing() is called, gradient checkpointing is enabled by setting gradient_checkpointing = True and assigning a checkpointing function to _gradient_checkpointing_func.

Important:

When using gradient checkpointing with use_reentrant=True, inputs that require gradients (e.g. hidden states) must be passed as positional arguments (*args) rather than keyword arguments to properly propagate gradients.

Example:

>>> # Correct - hidden_states passed as positional arg
>>> out = self.layer(hidden_states, attention_mask=attention_mask)

>>> # Incorrect - hidden_states passed as keyword arg
>>> out = self.layer(hidden_states=hidden_states, attention_mask=attention_mask)

Attention Functions[[transformers.AttentionInterface]]

transformers.AttentionInterface[[transformers.AttentionInterface]]

Source

Dict-like object keeping track of allowed attention functions. You can easily add a new attention function with a call to register(). If a model needs to locally overwrite an existing attention function, say sdpa, it needs to declare a new instance of this class inside the modeling_.py, and declare it on that instance.

registertransformers.AttentionInterface.registerhttps://github.com/huggingface/transformers/blob/vr_26617/src/transformers/utils/generic.py#L964[{"name": "key", "val": ": str"}, {"name": "value", "val": ": Callable"}]

Attention Mask Functions[[transformers.AttentionMaskInterface]]

transformers.AttentionMaskInterface[[transformers.AttentionMaskInterface]]

Source

registertransformers.AttentionMaskInterface.registerhttps://github.com/huggingface/transformers/blob/vr_26617/src/transformers/utils/generic.py#L964[{"name": "key", "val": ": str"}, {"name": "value", "val": ": Callable"}]

Rotary Position Embedding Functions[[transformers.dynamic_rope_update]]

transformers.dynamic_rope_update[[transformers.dynamic_rope_update]]

Source

Decorator function to update the RoPE parameters in the forward pass, if the model is using a dynamic RoPE (i.e. a RoPE implementation that may recompute its frequencies in the forward pass).

Parameters:

rope_forward (Callable) : The forward pass of the RoPE implementation.

Returns:

The decorated forward pass.

Pytorch custom modules[[transformers.Conv1D]]

transformers.Conv1D[[transformers.Conv1D]]

Source

1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).

Basically works like a linear layer but the weights are transposed.

Parameters:

nf (int) : The number of output features.

nx (int) : The number of input features.

PyTorch Helper Functions[[transformers.apply_chunking_to_forward]]

transformers.apply_chunking_to_forward[[transformers.apply_chunking_to_forward]]

Source

This function chunks the input_tensors into smaller input tensor parts of size chunk_size over the dimension chunk_dim. It then applies a layer forward_fn to each chunk independently to save memory.

If the forward_fn is independent across the chunk_dim this function will yield the same result as directly applying forward_fn to input_tensors.

Examples:

# rename the usual forward() fn to forward_chunk()
def forward_chunk(self, hidden_states):
    hidden_states = self.decoder(hidden_states)
    return hidden_states

# implement a chunked forward function
def forward(self, hidden_states):
    return apply_chunking_to_forward(self.forward_chunk, self.chunk_size_lm_head, self.seq_len_dim, hidden_states)

Parameters:

forward_fn (Callable[..., torch.Tensor]) : The forward function of the model.

chunk_size (int) : The chunk size of a chunked tensor: num_chunks = len(input_tensors[0]) / chunk_size.

chunk_dim (int) : The dimension over which the input_tensors should be chunked.

input_tensors (tuple[torch.Tensor]) : The input tensors of forward_fn which will be chunked

Returns:

torch.Tensor

A tensor with the same shape as the forward_fn would have given if applied`.

transformers.pytorch_utils.prune_linear_layer[[transformers.pytorch_utils.prune_linear_layer]]

Source

Prune a linear layer to keep only entries in index.

Used to remove heads.

Parameters:

layer (torch.nn.Linear) : The layer to prune.

index (torch.LongTensor) : The indices to keep in the layer.

dim (int, optional, defaults to 0) : The dimension on which to keep the indices.

Returns:

torch.nn.Linear

The pruned layer as a new layer with requires_grad=True.

Xet Storage Details

Size:: 7.76 kB
Xet hash:: 59578c685a049c022143ddb14f2af0ba553d980c17b02be9b03bb63fc85e1cfa

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.