# Embedding

The embedding class is used to store and retrieve word embeddings from their indices. There are two types of embeddings in bitsandbytes, the standard PyTorch `Embedding` class and the `StableEmbedding` class.

The `StableEmbedding` class was introduced in the [8-bit Optimizers via Block-wise Quantization](https://hf.co/papers/2110.02861) paper to reduce gradient variance as a result of the non-uniform distribution of input tokens. This class is designed to support quantization.

## Embedding[[bitsandbytes.nn.Embedding]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class bitsandbytes.nn.Embedding</name><anchor>bitsandbytes.nn.Embedding</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L128</source><parameters>[{"name": "num_embeddings", "val": ": int"}, {"name": "embedding_dim", "val": ": int"}, {"name": "padding_idx", "val": ": typing.Optional[int] = None"}, {"name": "max_norm", "val": ": typing.Optional[float] = None"}, {"name": "norm_type", "val": ": float = 2.0"}, {"name": "scale_grad_by_freq", "val": ": bool = False"}, {"name": "sparse", "val": ": bool = False"}, {"name": "_weight", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "device", "val": ": typing.Optional[torch.device] = None"}]</parameters></docstring>

Embedding class to store and retrieve word embeddings from their indices.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>__init__</name><anchor>bitsandbytes.nn.Embedding.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L133</source><parameters>[{"name": "num_embeddings", "val": ": int"}, {"name": "embedding_dim", "val": ": int"}, {"name": "padding_idx", "val": ": typing.Optional[int] = None"}, {"name": "max_norm", "val": ": typing.Optional[float] = None"}, {"name": "norm_type", "val": ": float = 2.0"}, {"name": "scale_grad_by_freq", "val": ": bool = False"}, {"name": "sparse", "val": ": bool = False"}, {"name": "_weight", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "device", "val": ": typing.Optional[torch.device] = None"}]</parameters><paramsdesc>- **num_embeddings** (`int`) --
  The number of unique embeddings (vocabulary size).
- **embedding_dim** (`int`) --
  The dimensionality of the embedding.
- **padding_idx** (`Optional[int]`) --
  Pads the output with zeros at the given index.
- **max_norm** (`Optional[float]`) --
  Renormalizes embeddings to have a maximum L2 norm.
- **norm_type** (`float`, defaults to `2.0`) --
  The p-norm to compute for the `max_norm` option.
- **scale_grad_by_freq** (`bool`, defaults to `False`) --
  Scale gradient by frequency during backpropagation.
- **sparse** (`bool`, defaults to `False`) --
  Computes dense gradients. Set to `True` to compute sparse gradients instead.
- **_weight** (`Optional[Tensor]`) --
  Pretrained embeddings.</paramsdesc><paramgroups>0</paramgroups></docstring>




</div></div>

## StableEmbedding[[bitsandbytes.nn.StableEmbedding]]

<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>class bitsandbytes.nn.StableEmbedding</name><anchor>bitsandbytes.nn.StableEmbedding</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L22</source><parameters>[{"name": "num_embeddings", "val": ": int"}, {"name": "embedding_dim", "val": ": int"}, {"name": "padding_idx", "val": ": typing.Optional[int] = None"}, {"name": "max_norm", "val": ": typing.Optional[float] = None"}, {"name": "norm_type", "val": ": float = 2.0"}, {"name": "scale_grad_by_freq", "val": ": bool = False"}, {"name": "sparse", "val": ": bool = False"}, {"name": "_weight", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "device", "val": " = None"}, {"name": "dtype", "val": " = None"}]</parameters><paramsdesc>- **norm** (`torch.nn.LayerNorm`) -- Layer normalization applied after the embedding.</paramsdesc><paramgroups>0</paramgroups></docstring>

Custom embedding layer designed to improve stability during training for NLP tasks by using 32-bit optimizer states. It is designed to reduce gradient variations that can result from quantization. This embedding layer is initialized with Xavier uniform initialization followed by layer normalization.

<ExampleCodeBlock anchor="bitsandbytes.nn.StableEmbedding.example">

Example:

```
# Initialize StableEmbedding layer with vocabulary size 1000, embedding dimension 300
embedding_layer = StableEmbedding(num_embeddings=1000, embedding_dim=300)

# Reset embedding parameters
embedding_layer.reset_parameters()

# Perform a forward pass with input tensor
input_tensor = torch.tensor([1, 2, 3])
output_embedding = embedding_layer(input_tensor)
```

</ExampleCodeBlock>



Methods:
reset_parameters(): Reset embedding parameters using Xavier uniform initialization.
forward(input: Tensor) -> Tensor: Forward pass through the stable embedding layer.



<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


<docstring><name>__init__</name><anchor>bitsandbytes.nn.StableEmbedding.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L48</source><parameters>[{"name": "num_embeddings", "val": ": int"}, {"name": "embedding_dim", "val": ": int"}, {"name": "padding_idx", "val": ": typing.Optional[int] = None"}, {"name": "max_norm", "val": ": typing.Optional[float] = None"}, {"name": "norm_type", "val": ": float = 2.0"}, {"name": "scale_grad_by_freq", "val": ": bool = False"}, {"name": "sparse", "val": ": bool = False"}, {"name": "_weight", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "device", "val": " = None"}, {"name": "dtype", "val": " = None"}]</parameters><paramsdesc>- **num_embeddings** (`int`) --
  The number of unique embeddings (vocabulary size).
- **embedding_dim** (`int`) --
  The dimensionality of the embedding.
- **padding_idx** (`Optional[int]`) --
  Pads the output with zeros at the given index.
- **max_norm** (`Optional[float]`) --
  Renormalizes embeddings to have a maximum L2 norm.
- **norm_type** (`float`, defaults to `2.0`) --
  The p-norm to compute for the `max_norm` option.
- **scale_grad_by_freq** (`bool`, defaults to `False`) --
  Scale gradient by frequency during backpropagation.
- **sparse** (`bool`, defaults to `False`) --
  Computes dense gradients. Set to `True` to compute sparse gradients instead.
- **_weight** (`Optional[Tensor]`) --
  Pretrained embeddings.</paramsdesc><paramgroups>0</paramgroups></docstring>




</div></div>

<EditOnGithub source="https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/reference/nn/embeddings.mdx" />