| --- |
| license: bsd-3-clause |
| tags: |
| - kernels |
| --- |
| # Triton layer normalization kernels. |
|
|
| This kernel implements layers normalization using Triton. This kernel is from |
| the [flash-attention](https://github.com/Dao-AILab/flash-attention) project. |
|
|
| ## Functions |
|
|
| ### Function `layer_norm` |
| |
| `(x: torch.Tensor, weight: torch.Tensor, bias: torch.Tensor, residual: Optional[torch.Tensor] = None, x1: Optional[torch.Tensor] = None, weight1: Optional[torch.Tensor] = None, bias1: Optional[torch.Tensor] = None, eps: float = 1e-06, dropout_p: float = 0.0, rowscale=None, prenorm: bool = False, residual_in_fp32: bool = False, is_rms_norm: bool = False, return_dropout_mask: bool = False, out: Optional[torch.Tensor] = None, residual_out: Optional[torch.Tensor] = None)` |
| |
| Apply layer normalization to the input tensor with Triton acceleration. |
| |
| ### Parameters |
| |
| - **x** (*torch.Tensor*) -- |
| Input tensor to normalize. |
| - **weight** (*torch.Tensor*) -- |
| Scale parameter for normalization. |
| - **bias** (*torch.Tensor*) -- |
| Shift parameter for normalization. |
| - **residual** (*torch.Tensor*, *optional*) -- |
| Optional residual tensor to add to the input before normalization. |
| - **x1** (*torch.Tensor*, *optional*) -- |
| Optional second input tensor to combine with *x*. When provided, the function |
| first adds *x1* to *x* and then applies normalization. |
| - **weight1** (*torch.Tensor*, *optional*) -- |
| Scale parameter for the second normalization. |
| - **bias1** (*torch.Tensor*, *optional*) -- |
| Shift parameter for the second normalization. |
| - **eps** (*float*, *optional*, defaults to 1e-6) -- |
| Small constant added for numerical stability in normalization. |
| - **dropout_p** (*float*, *optional*, defaults to 0.0) -- |
| Dropout probability. If greater than 0, applies dropout to the input before |
| normalization and residual addition. |
| - **rowscale** (*torch.Tensor*, *optional*) -- |
| Optional scaling factor applied to each row of the input tensor. |
| Not compatible with the use of *x1*. |
| - **prenorm** (*bool*, *optional*, defaults to False) -- |
| If True, returns both the normalized output and the unnormalized input+residual. |
| - **residual_in_fp32** (*bool*, *optional*, defaults to False) -- |
| If True, performs the residual connection in FP32 precision. |
| - **is_rms_norm** (*bool*, *optional*, defaults to False) -- |
| If True, uses RMS normalization instead of layer normalization. |
| - **return_dropout_mask** (*bool*, *optional*, defaults to False) -- |
| If True, returns the dropout mask used for the computation. |
| - **out** (*torch.Tensor*, *optional*) -- |
| Output tensor for the normalized result. If *None*, a new tensor is allocated. |
| - **residual_out** (*torch.Tensor*, *optional*) -- |
| Output tensor for the residual result when using prenorm. If *None*, a new tensor |
| is allocated when needed. |
| |
| ### Returns |
| |
| **Type**: *torch.Tensor* or tuple of *torch.Tensor* |
| |
| - The normalized input. |
| - The second normalization of the input if *weight1* is provided. |
| - The residual tensor if *prenorm* is set. |
| - The dropout mask if *return_dropout_mask* is set. |
| - The dropout mask for *x1* if *x1* is provided and *return_dropout_mask* is set. |
| |
| ## Layers |
| |
| ### Class `LlamaRMSNorm` |
| |
| No documentation available. |
| |
| #### Methods |
| |
| ##### Method `forward` |
| |
| `(self, hidden_states: torch.Tensor) -> torch.Tensor` |
|
|
| No documentation available. |
|
|