Buckets:
| # Embedding | |
| The embedding class is used to store and retrieve word embeddings from their indices. There are two types of embeddings in bitsandbytes, the standard PyTorch `Embedding` class and the `StableEmbedding` class. | |
| The `StableEmbedding` class was introduced in the [8-bit Optimizers via Block-wise Quantization](https://hf.co/papers/2110.02861) paper to reduce gradient variance as a result of the non-uniform distribution of input tokens. This class is designed to support quantization. | |
| ## Embedding[[bitsandbytes.nn.Embedding]] | |
| #### bitsandbytes.nn.Embedding[[bitsandbytes.nn.Embedding]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L134) | |
| Embedding class to store and retrieve word embeddings from their indices. | |
| __init__bitsandbytes.nn.Embedding.__init__https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L139[{"name": "num_embeddings", "val": ": int"}, {"name": "embedding_dim", "val": ": int"}, {"name": "padding_idx", "val": ": typing.Optional[int] = None"}, {"name": "max_norm", "val": ": typing.Optional[float] = None"}, {"name": "norm_type", "val": ": float = 2.0"}, {"name": "scale_grad_by_freq", "val": ": bool = False"}, {"name": "sparse", "val": ": bool = False"}, {"name": "_weight", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "device", "val": ": typing.Optional[torch.device] = None"}]- **num_embeddings** (`int`) -- | |
| The number of unique embeddings (vocabulary size). | |
| - **embedding_dim** (`int`) -- | |
| The dimensionality of the embedding. | |
| - **padding_idx** (`Optional[int]`) -- | |
| Pads the output with zeros at the given index. | |
| - **max_norm** (`Optional[float]`) -- | |
| Renormalizes embeddings to have a maximum L2 norm. | |
| - **norm_type** (`float`, defaults to `2.0`) -- | |
| The p-norm to compute for the `max_norm` option. | |
| - **scale_grad_by_freq** (`bool`, defaults to `False`) -- | |
| Scale gradient by frequency during backpropagation. | |
| - **sparse** (`bool`, defaults to `False`) -- | |
| Computes dense gradients. Set to `True` to compute sparse gradients instead. | |
| - **_weight** (`Optional[Tensor]`) -- | |
| Pretrained embeddings.0 | |
| **Parameters:** | |
| num_embeddings (`int`) : The number of unique embeddings (vocabulary size). | |
| embedding_dim (`int`) : The dimensionality of the embedding. | |
| padding_idx (`Optional[int]`) : Pads the output with zeros at the given index. | |
| max_norm (`Optional[float]`) : Renormalizes embeddings to have a maximum L2 norm. | |
| norm_type (`float`, defaults to `2.0`) : The p-norm to compute for the `max_norm` option. | |
| scale_grad_by_freq (`bool`, defaults to `False`) : Scale gradient by frequency during backpropagation. | |
| sparse (`bool`, defaults to `False`) : Computes dense gradients. Set to `True` to compute sparse gradients instead. | |
| _weight (`Optional[Tensor]`) : Pretrained embeddings. | |
| ## StableEmbedding[[bitsandbytes.nn.StableEmbedding]] | |
| #### bitsandbytes.nn.StableEmbedding[[bitsandbytes.nn.StableEmbedding]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L28) | |
| Custom embedding layer designed to improve stability during training for NLP tasks by using 32-bit optimizer states. It is designed to reduce gradient variations that can result from quantization. This embedding layer is initialized with Xavier uniform initialization followed by layer normalization. | |
| Example: | |
| ``` | |
| # Initialize StableEmbedding layer with vocabulary size 1000, embedding dimension 300 | |
| embedding_layer = StableEmbedding(num_embeddings=1000, embedding_dim=300) | |
| # Reset embedding parameters | |
| embedding_layer.reset_parameters() | |
| # Perform a forward pass with input tensor | |
| input_tensor = torch.tensor([1, 2, 3]) | |
| output_embedding = embedding_layer(input_tensor) | |
| ``` | |
| Methods: | |
| reset_parameters(): Reset embedding parameters using Xavier uniform initialization. | |
| forward(input: Tensor) -> Tensor: Forward pass through the stable embedding layer. | |
| __init__bitsandbytes.nn.StableEmbedding.__init__https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L54[{"name": "num_embeddings", "val": ": int"}, {"name": "embedding_dim", "val": ": int"}, {"name": "padding_idx", "val": ": typing.Optional[int] = None"}, {"name": "max_norm", "val": ": typing.Optional[float] = None"}, {"name": "norm_type", "val": ": float = 2.0"}, {"name": "scale_grad_by_freq", "val": ": bool = False"}, {"name": "sparse", "val": ": bool = False"}, {"name": "_weight", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "device", "val": " = None"}, {"name": "dtype", "val": " = None"}]- **num_embeddings** (`int`) -- | |
| The number of unique embeddings (vocabulary size). | |
| - **embedding_dim** (`int`) -- | |
| The dimensionality of the embedding. | |
| - **padding_idx** (`Optional[int]`) -- | |
| Pads the output with zeros at the given index. | |
| - **max_norm** (`Optional[float]`) -- | |
| Renormalizes embeddings to have a maximum L2 norm. | |
| - **norm_type** (`float`, defaults to `2.0`) -- | |
| The p-norm to compute for the `max_norm` option. | |
| - **scale_grad_by_freq** (`bool`, defaults to `False`) -- | |
| Scale gradient by frequency during backpropagation. | |
| - **sparse** (`bool`, defaults to `False`) -- | |
| Computes dense gradients. Set to `True` to compute sparse gradients instead. | |
| - **_weight** (`Optional[Tensor]`) -- | |
| Pretrained embeddings.0 | |
| **Parameters:** | |
| norm (`torch.nn.LayerNorm`) : Layer normalization applied after the embedding. | |
Xet Storage Details
- Size:
- 5.43 kB
- Xet hash:
- 28ef38c0d7a8409f18c595ade73c9fb749173ef8bbfa7f6b7327a160b5b13caf
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.