Buckets:

hf-doc-build
/

doc

Files

xet

hf-doc-build/doc / bitsandbytes /v0.48.2 /en /reference /functional.md

rtrm

about 1 month ago

preview code

download

raw

20.9 kB

	# Overview
	The `bitsandbytes.functional` API provides the low-level building blocks for the library's features.

	## When to Use `bitsandbytes.functional`

	* When you need direct control over quantized operations and their parameters.
	* To build custom layers or operations leveraging low-bit arithmetic.
	* To integrate with other ecosystem tooling.
	* For experimental or research purposes requiring non-standard quantization or performance optimizations.

	## LLM.int8()[[bitsandbytes.functional.int8_linear_matmul]]
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.int8_linear_matmul</name><anchor>bitsandbytes.functional.int8_linear_matmul</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L1744</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "B", "val": ": Tensor"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "dtype", "val": " = torch.int32"}]</parameters><paramsdesc>- A (`torch.Tensor`) -- The first matrix operand with the data type `torch.int8`.
	- B (`torch.Tensor`) -- The second matrix operand with the data type `torch.int8`.
	- out (`torch.Tensor`, optional) -- A pre-allocated tensor used to store the result.
	- dtype (`torch.dtype`, optional) -- The expected data type of the output. Defaults to `torch.int32`.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>The result of the operation.</retdesc><raises>- ``NotImplementedError`` -- The operation is not supported in the current environment.
	- ``RuntimeError`` -- Raised when the cannot be completed for any other reason.</raises><raisederrors>``NotImplementedError`` or ``RuntimeError``</raisederrors></docstring>
	Performs an 8-bit integer matrix multiplication.

	A linear transformation is applied such that `out = A @ B.T`. When possible, integer tensor core hardware is
	utilized to accelerate the operation.












	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.int8_mm_dequant</name><anchor>bitsandbytes.functional.int8_mm_dequant</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L1770</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "row_stats", "val": ": Tensor"}, {"name": "col_stats", "val": ": Tensor"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "bias", "val": ": typing.Optional[torch.Tensor] = None"}]</parameters><paramsdesc>- A (`torch.Tensor` with dtype `torch.int32`) -- The result of a quantized int8 matrix multiplication.
	- row_stats (`torch.Tensor`) -- The row-wise quantization statistics for the lhs operand of the matrix multiplication.
	- col_stats (`torch.Tensor`) -- The column-wise quantization statistics for the rhs operand of the matrix multiplication.
	- out (`torch.Tensor`, optional) -- A pre-allocated tensor to store the output of the operation.
	- bias (`torch.Tensor`, optional) -- An optional bias vector to add to the result.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>The dequantized result with an optional bias, with dtype `torch.float16`.</retdesc></docstring>
	Performs dequantization on the result of a quantized int8 matrix multiplication.








	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.int8_vectorwise_dequant</name><anchor>bitsandbytes.functional.int8_vectorwise_dequant</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L2026</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "stats", "val": ": Tensor"}]</parameters><paramsdesc>- A (`torch.Tensor` with dtype `torch.int8`) -- The quantized int8 tensor.
	- stats (`torch.Tensor` with dtype `torch.float32`) -- The row-wise quantization statistics.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor` with dtype `torch.float32`</rettype><retdesc>The dequantized tensor.</retdesc></docstring>
	Dequantizes a tensor with dtype `torch.int8` to `torch.float32`.








	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.int8_vectorwise_quant</name><anchor>bitsandbytes.functional.int8_vectorwise_quant</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L2040</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "threshold", "val": " = 0.0"}]</parameters><paramsdesc>- A (`torch.Tensor` with dtype `torch.float16`) -- The input tensor.
	- threshold (`float`, optional) --
	An optional threshold for sparse decomposition of outlier features.

	No outliers are held back when 0.0. Defaults to 0.0.</paramsdesc><paramgroups>0</paramgroups><rettype>`Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]`</rettype><retdesc>A tuple containing the quantized tensor and relevant statistics.
	- `torch.Tensor` with dtype `torch.int8`: The quantized data.
	- `torch.Tensor` with dtype `torch.float32`: The quantization scales.
	- `torch.Tensor` with dtype `torch.int32`, optional: A list of column indices which contain outlier features.</retdesc></docstring>
	Quantizes a tensor with dtype `torch.float16` to `torch.int8` in accordance to the `LLM.int8()` algorithm.

	For more information, see the [LLM.int8() paper](https://arxiv.org/abs/2208.07339).








	</div>

	## 4-bit[[bitsandbytes.functional.dequantize_4bit]]
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.dequantize_4bit</name><anchor>bitsandbytes.functional.dequantize_4bit</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L931</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "quant_state", "val": ": typing.Optional[bitsandbytes.functional.QuantState] = None"}, {"name": "absmax", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "blocksize", "val": ": typing.Optional[int] = None"}, {"name": "quant_type", "val": " = 'fp4'"}]</parameters><paramsdesc>- A (`torch.Tensor`) -- The quantized input tensor.
	- quant_state (`QuantState`, optional) --
	The quantization state as returned by `quantize_4bit`.
	Required if `absmax` is not provided.
	- absmax (`torch.Tensor`, optional) --
	A tensor containing the scaling values.
	Required if `quant_state` is not provided and ignored otherwise.
	- out (`torch.Tensor`, optional) -- A tensor to use to store the result.
	- blocksize (`int`, optional) --
	The size of the blocks. Defaults to 128 on ROCm and 64 otherwise.
	Valid values are 64, 128, 256, 512, 1024, 2048, and 4096.
	- quant_type (`str`, optional) -- The data type to use: `nf4` or `fp4`. Defaults to `fp4`.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>The dequantized tensor.</retdesc><raises>- ``ValueError`` -- Raised when the input data type or blocksize is not supported.</raises><raisederrors>``ValueError``</raisederrors></docstring>
	Dequantizes a packed 4-bit quantized tensor.

	The input tensor is dequantized by dividing it into blocks of `blocksize` values.
	The the absolute maximum value within these blocks is used for scaling
	the non-linear dequantization.












	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.dequantize_fp4</name><anchor>bitsandbytes.functional.dequantize_fp4</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L907</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "quant_state", "val": ": typing.Optional[bitsandbytes.functional.QuantState] = None"}, {"name": "absmax", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "blocksize", "val": ": typing.Optional[int] = None"}]</parameters></docstring>


	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.dequantize_nf4</name><anchor>bitsandbytes.functional.dequantize_nf4</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L919</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "quant_state", "val": ": typing.Optional[bitsandbytes.functional.QuantState] = None"}, {"name": "absmax", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "blocksize", "val": ": typing.Optional[int] = None"}]</parameters></docstring>


	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.gemv_4bit</name><anchor>bitsandbytes.functional.gemv_4bit</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L1510</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "B", "val": ": Tensor"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "transposed_A", "val": " = False"}, {"name": "transposed_B", "val": " = False"}, {"name": "state", "val": " = None"}]</parameters></docstring>


	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.quantize_4bit</name><anchor>bitsandbytes.functional.quantize_4bit</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L826</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "absmax", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "blocksize", "val": " = None"}, {"name": "compress_statistics", "val": " = False"}, {"name": "quant_type", "val": " = 'fp4'"}, {"name": "quant_storage", "val": " = torch.uint8"}]</parameters><paramsdesc>- A (`torch.Tensor`) -- The input tensor. Supports `float16`, `bfloat16`, or `float32` datatypes.
	- absmax (`torch.Tensor`, optional) -- A tensor to use to store the absmax values.
	- out (`torch.Tensor`, optional) -- A tensor to use to store the result.
	- blocksize (`int`, optional) --
	The size of the blocks. Defaults to 128 on ROCm and 64 otherwise.
	Valid values are 64, 128, 256, 512, 1024, 2048, and 4096.
	- compress_statistics (`bool`, optional) -- Whether to additionally quantize the absmax values. Defaults to False.
	- quant_type (`str`, optional) -- The data type to use: `nf4` or `fp4`. Defaults to `fp4`.
	- quant_storage (`torch.dtype`, optional) -- The dtype of the tensor used to store the result. Defaults to `torch.uint8`.</paramsdesc><paramgroups>0</paramgroups><rettype>Tuple[`torch.Tensor`, `QuantState`]</rettype><retdesc>A tuple containing the quantization results.
	- `torch.Tensor`: The quantized tensor with packed 4-bit values.
	- `QuantState`: The state object used to undo the quantization.</retdesc><raises>- ``ValueError`` -- Raised when the input data type is not supported.</raises><raisederrors>``ValueError``</raisederrors></docstring>
	Quantize tensor A in blocks of 4-bit values.

	Quantizes tensor A by dividing it into blocks which are independently quantized.












	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.quantize_fp4</name><anchor>bitsandbytes.functional.quantize_fp4</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L800</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "absmax", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "blocksize", "val": " = None"}, {"name": "compress_statistics", "val": " = False"}, {"name": "quant_storage", "val": " = torch.uint8"}]</parameters></docstring>


	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.quantize_nf4</name><anchor>bitsandbytes.functional.quantize_nf4</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L813</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "absmax", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "blocksize", "val": " = None"}, {"name": "compress_statistics", "val": " = False"}, {"name": "quant_storage", "val": " = torch.uint8"}]</parameters></docstring>


	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class bitsandbytes.functional.QuantState</name><anchor>bitsandbytes.functional.QuantState</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L393</source><parameters>[{"name": "absmax", "val": ""}, {"name": "shape", "val": " = None"}, {"name": "code", "val": " = None"}, {"name": "blocksize", "val": " = None"}, {"name": "quant_type", "val": " = None"}, {"name": "dtype", "val": " = None"}, {"name": "offset", "val": " = None"}, {"name": "state2", "val": " = None"}]</parameters></docstring>
	container for quantization state components to work with Params4bit and similar classes


	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>as_dict</name><anchor>bitsandbytes.functional.QuantState.as_dict</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L505</source><parameters>[{"name": "packed", "val": " = False"}]</parameters></docstring>

	returns dict of tensors and strings to use in serialization via _save_to_state_dict()
	param: packed -- returns dict[str, torch.Tensor] for state_dict fit for safetensors saving


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>from_dict</name><anchor>bitsandbytes.functional.QuantState.from_dict</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L454</source><parameters>[{"name": "qs_dict", "val": ": dict"}, {"name": "device", "val": ": device"}]</parameters></docstring>

	unpacks components of state_dict into QuantState
	where necessary, convert into strings, torch.dtype, ints, etc.

	qs_dict: based on state_dict, with only relevant keys, striped of prefixes.

	item with key `quant_state.bitsandbytes__[nf4/fp4]` may contain minor and non-tensor quant state items.


	</div></div>

	## Dynamic 8-bit Quantization[[bitsandbytes.functional.dequantize_blockwise]]

	Primitives used in the 8-bit optimizer quantization.

	For more details see [8-Bit Approximations for Parallelism in Deep Learning](https://arxiv.org/abs/1511.04561)

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.dequantize_blockwise</name><anchor>bitsandbytes.functional.dequantize_blockwise</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L641</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "quant_state", "val": ": typing.Optional[bitsandbytes.functional.QuantState] = None"}, {"name": "absmax", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "code", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "blocksize", "val": ": int = 4096"}, {"name": "nested", "val": " = False"}]</parameters><paramsdesc>- A (`torch.Tensor`) -- The quantized input tensor.
	- quant_state (`QuantState`, optional) --
	The quantization state as returned by `quantize_blockwise`.
	Required if `absmax` is not provided.
	- absmax (`torch.Tensor`, optional) --
	A tensor containing the scaling values.
	Required if `quant_state` is not provided and ignored otherwise.
	- code (`torch.Tensor`, optional) --
	A mapping describing the low-bit data type. Defaults to a signed 8-bit dynamic type.
	For more details, see (8-Bit Approximations for Parallelism in Deep Learning)[https://arxiv.org/abs/1511.04561].
	Ignored when `quant_state` is provided.
	- out (`torch.Tensor`, optional) -- A tensor to use to store the result.
	- blocksize (`int`, optional) --
	The size of the blocks. Defaults to 4096.
	Valid values are 64, 128, 256, 512, 1024, 2048, and 4096.
	Ignored when `quant_state` is provided.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>The dequantized tensor. The datatype is indicated by `quant_state.dtype` and defaults to `torch.float32`.</retdesc><raises>- ``ValueError`` -- Raised when the input data type is not supported.</raises><raisederrors>``ValueError``</raisederrors></docstring>
	Dequantize a tensor in blocks of values.

	The input tensor is dequantized by dividing it into blocks of `blocksize` values.
	The the absolute maximum value within these blocks is used for scaling
	the non-linear dequantization.












	</div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.quantize_blockwise</name><anchor>bitsandbytes.functional.quantize_blockwise</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L570</source><parameters>[{"name": "A", "val": ": Tensor"}, {"name": "code", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "absmax", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "out", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "blocksize", "val": " = 4096"}, {"name": "nested", "val": " = False"}]</parameters><paramsdesc>- A (`torch.Tensor`) -- The input tensor. Supports `float16`, `bfloat16`, or `float32` datatypes.
	- code (`torch.Tensor`, optional) --
	A mapping describing the low-bit data type. Defaults to a signed 8-bit dynamic type.
	For more details, see (8-Bit Approximations for Parallelism in Deep Learning)[https://arxiv.org/abs/1511.04561].
	- absmax (`torch.Tensor`, optional) -- A tensor to use to store the absmax values.
	- out (`torch.Tensor`, optional) -- A tensor to use to store the result.
	- blocksize (`int`, optional) --
	The size of the blocks. Defaults to 4096.
	Valid values are 64, 128, 256, 512, 1024, 2048, and 4096.
	- nested (`bool`, optional) -- Whether to additionally quantize the absmax values. Defaults to False.</paramsdesc><paramgroups>0</paramgroups><rettype>`Tuple[torch.Tensor, QuantState]`</rettype><retdesc>A tuple containing the quantization results.
	- `torch.Tensor`: The quantized tensor.
	- `QuantState`: The state object used to undo the quantization.</retdesc><raises>- ``ValueError`` -- Raised when the input data type is not supported.</raises><raisederrors>``ValueError``</raisederrors></docstring>
	Quantize a tensor in blocks of values.

	The input tensor is quantized by dividing it into blocks of `blocksize` values.
	The the absolute maximum value within these blocks is calculated for scaling
	the non-linear quantization.












	</div>

	## Utility[[bitsandbytes.functional.get_ptr]]
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>bitsandbytes.functional.get_ptr</name><anchor>bitsandbytes.functional.get_ptr</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/functional.py#L378</source><parameters>[{"name": "A", "val": ": typing.Optional[torch.Tensor]"}]</parameters><paramsdesc>- A (`Optional[Tensor]`) -- A PyTorch tensor.</paramsdesc><paramgroups>0</paramgroups><rettype>`Optional[ct.c_void_p]`</rettype><retdesc>A pointer to the underlying tensor data.</retdesc></docstring>
	Gets the memory address of the first element of a tenso








	</div>

	<EditOnGithub source="https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/reference/functional.mdx" />

Xet Storage Details

Size:: 20.9 kB
Xet hash:: be4229750df6fc2de51522be406587254ebe1e6ea5e6d24cb043d5768f233d70

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.