Buckets:
| # Custom Layers and Utilities | |
| This page lists all the custom layers used by the library, as well as the utility functions and classes it provides for modeling. | |
| Most of those are only useful if you are studying the code of the models in the library. | |
| ## Layers[[transformers.GradientCheckpointingLayer]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.GradientCheckpointingLayer</name><anchor>transformers.GradientCheckpointingLayer</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_layers.py#L35</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| Base class for layers with gradient checkpointing. | |
| This class enables gradient checkpointing functionality for a layer. By default, gradient checkpointing is disabled | |
| (`gradient_checkpointing = False`). When `model.set_gradient_checkpointing()` is called, gradient checkpointing is | |
| enabled by setting `gradient_checkpointing = True` and assigning a checkpointing function to `_gradient_checkpointing_func`. | |
| Important: | |
| When using gradient checkpointing with `use_reentrant=True`, inputs that require gradients (e.g. hidden states) | |
| must be passed as positional arguments (`*args`) rather than keyword arguments to properly propagate gradients. | |
| <ExampleCodeBlock anchor="transformers.GradientCheckpointingLayer.example"> | |
| Example: | |
| ```python | |
| >>> # Correct - hidden_states passed as positional arg | |
| >>> out = self.layer(hidden_states, attention_mask=attention_mask) | |
| >>> # Incorrect - hidden_states passed as keyword arg | |
| >>> out = self.layer(hidden_states=hidden_states, attention_mask=attention_mask) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ## Attention Functions[[transformers.AttentionInterface]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.AttentionInterface</name><anchor>transformers.AttentionInterface</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L5377</source><parameters>[]</parameters></docstring> | |
| Dict-like object keeping track of allowed attention functions. You can easily add a new attention function | |
| with a call to `register()`. If a model needs to locally overwrite an existing attention function, say `sdpa`, | |
| it needs to declare a new instance of this class inside the `modeling_<model>.py`, and declare it on that instance. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>register</name><anchor>transformers.AttentionInterface.register</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/generic.py#L1009</source><parameters>[{"name": "key", "val": ": str"}, {"name": "value", "val": ": Callable"}]</parameters></docstring> | |
| </div></div> | |
| ## Attention Mask Functions[[transformers.AttentionMaskInterface]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.AttentionMaskInterface</name><anchor>transformers.AttentionMaskInterface</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/masking_utils.py#L677</source><parameters>[]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>register</name><anchor>transformers.AttentionMaskInterface.register</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/generic.py#L1009</source><parameters>[{"name": "key", "val": ": str"}, {"name": "value", "val": ": Callable"}]</parameters></docstring> | |
| </div></div> | |
| ## Rotary Position Embedding Functions[[transformers.dynamic_rope_update]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>transformers.dynamic_rope_update</name><anchor>transformers.dynamic_rope_update</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_rope_utils.py#L81</source><parameters>[{"name": "rope_forward", "val": ""}]</parameters><paramsdesc>- **rope_forward** (Callable) -- | |
| The forward pass of the RoPE implementation.</paramsdesc><paramgroups>0</paramgroups><retdesc>The decorated forward pass.</retdesc></docstring> | |
| Decorator function to update the RoPE parameters in the forward pass, if the model is using a dynamic RoPE | |
| (i.e. a RoPE implementation that may recompute its frequencies in the forward pass). | |
| </div> | |
| ## Pytorch custom modules[[transformers.Conv1D]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.Conv1D</name><anchor>transformers.Conv1D</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/pytorch_utils.py#L97</source><parameters>[{"name": "nf", "val": ""}, {"name": "nx", "val": ""}]</parameters><paramsdesc>- **nf** (`int`) -- The number of output features. | |
| - **nx** (`int`) -- The number of input features.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| 1D-convolutional layer as defined by Radford et al. for OpenAI GPT (and also used in GPT-2). | |
| Basically works like a linear layer but the weights are transposed. | |
| </div> | |
| ## PyTorch Helper Functions[[transformers.apply_chunking_to_forward]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>transformers.apply_chunking_to_forward</name><anchor>transformers.apply_chunking_to_forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/pytorch_utils.py#L126</source><parameters>[{"name": "forward_fn", "val": ": Callable[..., torch.Tensor]"}, {"name": "chunk_size", "val": ": int"}, {"name": "chunk_dim", "val": ": int"}, {"name": "*input_tensors", "val": ""}]</parameters><paramsdesc>- **forward_fn** (`Callable[..., torch.Tensor]`) -- | |
| The forward function of the model. | |
| - **chunk_size** (`int`) -- | |
| The chunk size of a chunked tensor: `num_chunks = len(input_tensors[0]) / chunk_size`. | |
| - **chunk_dim** (`int`) -- | |
| The dimension over which the `input_tensors` should be chunked. | |
| - **input_tensors** (`tuple[torch.Tensor]`) -- | |
| The input tensors of `forward_fn` which will be chunked</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>A tensor with the same shape as the `forward_fn` would have given if applied`.</retdesc></docstring> | |
| This function chunks the `input_tensors` into smaller input tensor parts of size `chunk_size` over the dimension | |
| `chunk_dim`. It then applies a layer `forward_fn` to each chunk independently to save memory. | |
| If the `forward_fn` is independent across the `chunk_dim` this function will yield the same result as directly | |
| applying `forward_fn` to `input_tensors`. | |
| <ExampleCodeBlock anchor="transformers.apply_chunking_to_forward.example"> | |
| Examples: | |
| ```python | |
| # rename the usual forward() fn to forward_chunk() | |
| def forward_chunk(self, hidden_states): | |
| hidden_states = self.decoder(hidden_states) | |
| return hidden_states | |
| # implement a chunked forward function | |
| def forward(self, hidden_states): | |
| return apply_chunking_to_forward(self.forward_chunk, self.chunk_size_lm_head, self.seq_len_dim, hidden_states) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>transformers.pytorch_utils.prune_linear_layer</name><anchor>transformers.pytorch_utils.prune_linear_layer</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/pytorch_utils.py#L63</source><parameters>[{"name": "layer", "val": ": nn.Linear"}, {"name": "index", "val": ": torch.LongTensor"}, {"name": "dim", "val": ": int = 0"}]</parameters><paramsdesc>- **layer** (`torch.nn.Linear`) -- The layer to prune. | |
| - **index** (`torch.LongTensor`) -- The indices to keep in the layer. | |
| - **dim** (`int`, *optional*, defaults to 0) -- The dimension on which to keep the indices.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.nn.Linear`</rettype><retdesc>The pruned layer as a new layer with `requires_grad=True`.</retdesc></docstring> | |
| Prune a linear layer to keep only entries in index. | |
| Used to remove heads. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/en/internal/modeling_utils.md" /> |
Xet Storage Details
- Size:
- 8.64 kB
- Xet hash:
- 2d61bbb9cfba0a1c59bbdd2404ddb8224061be9a5b0473b5ea26a30ea9902339
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.