Buckets:

hf-doc-build
/

doc

Files

xet

hf-doc-build/doc / bitsandbytes /main /en /reference /nn /embeddings.md

HuggingFaceDocBuilder

1 day ago

preview code

download

raw

5.43 kB

	# Embedding

	The embedding class is used to store and retrieve word embeddings from their indices. There are two types of embeddings in bitsandbytes, the standard PyTorch `Embedding` class and the `StableEmbedding` class.

	The `StableEmbedding` class was introduced in the [8-bit Optimizers via Block-wise Quantization](https://hf.co/papers/2110.02861) paper to reduce gradient variance as a result of the non-uniform distribution of input tokens. This class is designed to support quantization.

	## Embedding[[bitsandbytes.nn.Embedding]]

	#### bitsandbytes.nn.Embedding[[bitsandbytes.nn.Embedding]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L134)

	Embedding class to store and retrieve word embeddings from their indices.

	__init__bitsandbytes.nn.Embedding.__init__https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L139[{"name": "num_embeddings", "val": ": int"}, {"name": "embedding_dim", "val": ": int"}, {"name": "padding_idx", "val": ": typing.Optional[int] = None"}, {"name": "max_norm", "val": ": typing.Optional[float] = None"}, {"name": "norm_type", "val": ": float = 2.0"}, {"name": "scale_grad_by_freq", "val": ": bool = False"}, {"name": "sparse", "val": ": bool = False"}, {"name": "_weight", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "device", "val": ": typing.Optional[torch.device] = None"}]- num_embeddings (`int`) --
	The number of unique embeddings (vocabulary size).
	- embedding_dim (`int`) --
	The dimensionality of the embedding.
	- padding_idx (`Optional[int]`) --
	Pads the output with zeros at the given index.
	- max_norm (`Optional[float]`) --
	Renormalizes embeddings to have a maximum L2 norm.
	- norm_type (`float`, defaults to `2.0`) --
	The p-norm to compute for the `max_norm` option.
	- scale_grad_by_freq (`bool`, defaults to `False`) --
	Scale gradient by frequency during backpropagation.
	- sparse (`bool`, defaults to `False`) --
	Computes dense gradients. Set to `True` to compute sparse gradients instead.
	- _weight (`Optional[Tensor]`) --
	Pretrained embeddings.0

	Parameters:

	num_embeddings (`int`) : The number of unique embeddings (vocabulary size).

	embedding_dim (`int`) : The dimensionality of the embedding.

	padding_idx (`Optional[int]`) : Pads the output with zeros at the given index.

	max_norm (`Optional[float]`) : Renormalizes embeddings to have a maximum L2 norm.

	norm_type (`float`, defaults to `2.0`) : The p-norm to compute for the `max_norm` option.

	scale_grad_by_freq (`bool`, defaults to `False`) : Scale gradient by frequency during backpropagation.

	sparse (`bool`, defaults to `False`) : Computes dense gradients. Set to `True` to compute sparse gradients instead.

	_weight (`Optional[Tensor]`) : Pretrained embeddings.

	## StableEmbedding[[bitsandbytes.nn.StableEmbedding]]

	#### bitsandbytes.nn.StableEmbedding[[bitsandbytes.nn.StableEmbedding]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L28)

	Custom embedding layer designed to improve stability during training for NLP tasks by using 32-bit optimizer states. It is designed to reduce gradient variations that can result from quantization. This embedding layer is initialized with Xavier uniform initialization followed by layer normalization.

	Example:

	```
	# Initialize StableEmbedding layer with vocabulary size 1000, embedding dimension 300
	embedding_layer = StableEmbedding(num_embeddings=1000, embedding_dim=300)

	# Reset embedding parameters
	embedding_layer.reset_parameters()

	# Perform a forward pass with input tensor
	input_tensor = torch.tensor([1, 2, 3])
	output_embedding = embedding_layer(input_tensor)
	```

	Methods:
	reset_parameters(): Reset embedding parameters using Xavier uniform initialization.
	forward(input: Tensor) -> Tensor: Forward pass through the stable embedding layer.

	__init__bitsandbytes.nn.StableEmbedding.__init__https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/nn/modules.py#L54[{"name": "num_embeddings", "val": ": int"}, {"name": "embedding_dim", "val": ": int"}, {"name": "padding_idx", "val": ": typing.Optional[int] = None"}, {"name": "max_norm", "val": ": typing.Optional[float] = None"}, {"name": "norm_type", "val": ": float = 2.0"}, {"name": "scale_grad_by_freq", "val": ": bool = False"}, {"name": "sparse", "val": ": bool = False"}, {"name": "_weight", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "device", "val": " = None"}, {"name": "dtype", "val": " = None"}]- num_embeddings (`int`) --
	The number of unique embeddings (vocabulary size).
	- embedding_dim (`int`) --
	The dimensionality of the embedding.
	- padding_idx (`Optional[int]`) --
	Pads the output with zeros at the given index.
	- max_norm (`Optional[float]`) --
	Renormalizes embeddings to have a maximum L2 norm.
	- norm_type (`float`, defaults to `2.0`) --
	The p-norm to compute for the `max_norm` option.
	- scale_grad_by_freq (`bool`, defaults to `False`) --
	Scale gradient by frequency during backpropagation.
	- sparse (`bool`, defaults to `False`) --
	Computes dense gradients. Set to `True` to compute sparse gradients instead.
	- _weight (`Optional[Tensor]`) --
	Pretrained embeddings.0

	Parameters:

	norm (`torch.nn.LayerNorm`) : Layer normalization applied after the embedding.

Xet Storage Details

Size:: 5.43 kB
Xet hash:: 28ef38c0d7a8409f18c595ade73c9fb749173ef8bbfa7f6b7327a160b5b13caf

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.