Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / transformers /pr_33892 /en /internal /modeling_utils.md

rtrm

about 1 month ago

preview code

download

raw

8.64 kB

Custom Layers and Utilities

This page lists all the custom layers used by the library, as well as the utility functions and classes it provides for modeling.

Most of those are only useful if you are studying the code of the models in the library.

Layers[[transformers.GradientCheckpointingLayer]]

class transformers.GradientCheckpointingLayertransformers.GradientCheckpointingLayerhttps://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_layers.py#L35[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}] Base class for layers with gradient checkpointing.

This class enables gradient checkpointing functionality for a layer. By default, gradient checkpointing is disabled (gradient_checkpointing = False). When model.set_gradient_checkpointing() is called, gradient checkpointing is enabled by setting gradient_checkpointing = True and assigning a checkpointing function to _gradient_checkpointing_func.

Important:

When using gradient checkpointing with use_reentrant=True, inputs that require gradients (e.g. hidden states) must be passed as positional arguments (*args) rather than keyword arguments to properly propagate gradients.

Example:

>>> # Correct - hidden_states passed as positional arg
>>> out = self.layer(hidden_states, attention_mask=attention_mask)

>>> # Incorrect - hidden_states passed as keyword arg
>>> out = self.layer(hidden_states=hidden_states, attention_mask=attention_mask)

Attention Functions[[transformers.AttentionInterface]]

class transformers.AttentionInterfacetransformers.AttentionInterfacehttps://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L5377[]

Dict-like object keeping track of allowed attention functions. You can easily add a new attention function with a call to register(). If a model needs to locally overwrite an existing attention function, say sdpa, it needs to declare a new instance of this class inside the modeling_<model>.py, and declare it on that instance.

registertransformers.AttentionInterface.registerhttps://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/generic.py#L1009[{"name": "key", "val": ": str"}, {"name": "value", "val": ": Callable"}]

Attention Mask Functions[[transformers.AttentionMaskInterface]]

class transformers.AttentionMaskInterfacetransformers.AttentionMaskInterfacehttps://github.com/huggingface/transformers/blob/vr_33892/src/transformers/masking_utils.py#L677[]

registertransformers.AttentionMaskInterface.registerhttps://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/generic.py#L1009[{"name": "key", "val": ": str"}, {"name": "value", "val": ": Callable"}]

Rotary Position Embedding Functions[[transformers.dynamic_rope_update]]

transformers.dynamic_rope_updatetransformers.dynamic_rope_updatehttps://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_rope_utils.py#L81[{"name": "rope_forward", "val": ""}]- rope_forward (Callable) -- The forward pass of the RoPE implementation.0The decorated forward pass.

Decorator function to update the RoPE parameters in the forward pass, if the model is using a dynamic RoPE (i.e. a RoPE implementation that may recompute its frequencies in the forward pass).

Pytorch custom modules[[transformers.Conv1D]]

class transformers.Conv1Dtransformers.Conv1Dhttps://github.com/huggingface/transformers/blob/vr_33892/src/transformers/pytorch_utils.py#L97[{"name": "nf", "val": ""}, {"name": "nx", "val": ""}]- nf (int) -- The number of output features.

nx (int) -- The number of input features.0

1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2).

Basically works like a linear layer but the weights are transposed.

PyTorch Helper Functions[[transformers.apply_chunking_to_forward]]

transformers.apply_chunking_to_forwardtransformers.apply_chunking_to_forwardhttps://github.com/huggingface/transformers/blob/vr_33892/src/transformers/pytorch_utils.py#L126[{"name": "forward_fn", "val": ": Callable[..., torch.Tensor]"}, {"name": "chunk_size", "val": ": int"}, {"name": "chunk_dim", "val": ": int"}, {"name": "*input_tensors", "val": ""}]- forward_fn (Callable[..., torch.Tensor]) -- The forward function of the model.

chunk_size (int) -- The chunk size of a chunked tensor: num_chunks = len(input_tensors[0]) / chunk_size.
chunk_dim (int) -- The dimension over which the input_tensors should be chunked.
input_tensors (tuple[torch.Tensor]) -- The input tensors of forward_fn which will be chunked0torch.TensorA tensor with the same shape as the forward_fn would have given if applied`.

This function chunks the input_tensors into smaller input tensor parts of size chunk_size over the dimension chunk_dim. It then applies a layer forward_fn to each chunk independently to save memory.

If the forward_fn is independent across the chunk_dim this function will yield the same result as directly applying forward_fn to input_tensors.

Examples:

# rename the usual forward() fn to forward_chunk()
def forward_chunk(self, hidden_states):
    hidden_states = self.decoder(hidden_states)
    return hidden_states


# implement a chunked forward function
def forward(self, hidden_states):
    return apply_chunking_to_forward(self.forward_chunk, self.chunk_size_lm_head, self.seq_len_dim, hidden_states)

transformers.pytorch_utils.prune_linear_layertransformers.pytorch_utils.prune_linear_layerhttps://github.com/huggingface/transformers/blob/vr_33892/src/transformers/pytorch_utils.py#L63[{"name": "layer", "val": ": nn.Linear"}, {"name": "index", "val": ": torch.LongTensor"}, {"name": "dim", "val": ": int = 0"}]- layer (torch.nn.Linear) -- The layer to prune.

index (torch.LongTensor) -- The indices to keep in the layer.
dim (int, optional, defaults to 0) -- The dimension on which to keep the indices.0torch.nn.LinearThe pruned layer as a new layer with requires_grad=True.

Prune a linear layer to keep only entries in index.

Used to remove heads.

Xet Storage Details

Size:: 8.64 kB
Xet hash:: 2d61bbb9cfba0a1c59bbdd2404ddb8224061be9a5b0473b5ea26a30ea9902339

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.