Buckets:

hf-doc-build
/

doc

Files

xet

hf-doc-build/doc / bitsandbytes /main /en /reference /functional.md

HuggingFaceDocBuilder

2 days ago

preview code

download

raw

11.9 kB

	# Overview
	The `bitsandbytes.functional` API provides the low-level building blocks for the library's features.

	## When to Use `bitsandbytes.functional`

	* When you need direct control over quantized operations and their parameters.
	* To build custom layers or operations leveraging low-bit arithmetic.
	* To integrate with other ecosystem tooling.
	* For experimental or research purposes requiring non-standard quantization or performance optimizations.

	## LLM.int8()[[bitsandbytes.functional.int8_linear_matmul]]
	#### bitsandbytes.functional.int8_linear_matmul[[bitsandbytes.functional.int8_linear_matmul]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1503)

	Performs an 8-bit integer matrix multiplication.

	A linear transformation is applied such that `out = A @ B.T`. When possible, integer tensor core hardware is
	utilized to accelerate the operation.

	Parameters:

	A (`torch.Tensor`) : The first matrix operand with the data type `torch.int8`.

	B (`torch.Tensor`) : The second matrix operand with the data type `torch.int8`.

	out (`torch.Tensor`, optional) : A pre-allocated tensor used to store the result.

	dtype (`torch.dtype`, optional) : The expected data type of the output. Defaults to `torch.int32`.

	Returns:

	``torch.Tensor``

	The result of the operation.

	#### bitsandbytes.functional.int8_mm_dequant[[bitsandbytes.functional.int8_mm_dequant]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1529)

	Performs dequantization on the result of a quantized int8 matrix multiplication.

	Parameters:

	A (`torch.Tensor` with dtype `torch.int32`) : The result of a quantized int8 matrix multiplication.

	row_stats (`torch.Tensor`) : The row-wise quantization statistics for the lhs operand of the matrix multiplication.

	col_stats (`torch.Tensor`) : The column-wise quantization statistics for the rhs operand of the matrix multiplication.

	out (`torch.Tensor`, optional) : A pre-allocated tensor to store the output of the operation.

	bias (`torch.Tensor`, optional) : An optional bias vector to add to the result.

	Returns:

	``torch.Tensor``

	The dequantized result with an optional bias, with dtype `torch.float16`.

	#### bitsandbytes.functional.int8_vectorwise_dequant[[bitsandbytes.functional.int8_vectorwise_dequant]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1608)

	Dequantizes a tensor with dtype `torch.int8` to `torch.float32`.

	Parameters:

	A (`torch.Tensor` with dtype `torch.int8`) : The quantized int8 tensor.

	stats (`torch.Tensor` with dtype `torch.float32`) : The row-wise quantization statistics.

	Returns:

	``torch.Tensor` with dtype `torch.float32``

	The dequantized tensor.

	#### bitsandbytes.functional.int8_vectorwise_quant[[bitsandbytes.functional.int8_vectorwise_quant]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1622)

	Quantizes a tensor with dtype `torch.float16` to `torch.int8` in accordance to the `LLM.int8()` algorithm.

	For more information, see the [LLM.int8() paper](https://arxiv.org/abs/2208.07339).

	Parameters:

	A (`torch.Tensor` with dtype `torch.float16`) : The input tensor.

	threshold (`float`, optional) : An optional threshold for sparse decomposition of outlier features. No outliers are held back when 0.0. Defaults to 0.0.

	Returns:

	``Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]``

	A tuple containing the quantized tensor and relevant statistics.
	- `torch.Tensor` with dtype `torch.int8`: The quantized data.
	- `torch.Tensor` with dtype `torch.float32`: The quantization scales.
	- `torch.Tensor` with dtype `torch.int32`, optional: A list of column indices which contain outlier features.

	## 4-bit[[bitsandbytes.functional.dequantize_4bit]]
	#### bitsandbytes.functional.dequantize_4bit[[bitsandbytes.functional.dequantize_4bit]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L973)

	Dequantizes a packed 4-bit quantized tensor.

	The input tensor is dequantized by dividing it into blocks of `blocksize` values.
	The absolute maximum value within these blocks is used for scaling
	the non-linear dequantization.

	Parameters:

	A (`torch.Tensor`) : The quantized input tensor.

	quant_state (`QuantState`, optional) : The quantization state as returned by `quantize_4bit`. Required if `absmax` is not provided.

	absmax (`torch.Tensor`, optional) : A tensor containing the scaling values. Required if `quant_state` is not provided and ignored otherwise.

	out (`torch.Tensor`, optional) : A tensor to use to store the result.

	blocksize (`int`, optional) : The size of the blocks. Defaults to 64. Valid values are 32, 64, 128, 256, 512, 1024, 2048, and 4096.

	quant_type (`str`, optional) : The data type to use: `nf4` or `fp4`. Defaults to `fp4`.

	Returns:

	``torch.Tensor``

	The dequantized tensor.

	#### bitsandbytes.functional.dequantize_fp4[[bitsandbytes.functional.dequantize_fp4]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L953)

	#### bitsandbytes.functional.dequantize_nf4[[bitsandbytes.functional.dequantize_nf4]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L963)

	#### bitsandbytes.functional.gemv_4bit[[bitsandbytes.functional.gemv_4bit]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1269)

	#### bitsandbytes.functional.quantize_4bit[[bitsandbytes.functional.quantize_4bit]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L872)

	Quantize tensor A in blocks of 4-bit values.

	Quantizes tensor A by dividing it into blocks which are independently quantized.

	Parameters:

	A (`torch.Tensor`) : The input tensor. Supports `float16`, `bfloat16`, or `float32` datatypes.

	absmax (`torch.Tensor`, optional) : A tensor to use to store the absmax values.

	out (`torch.Tensor`, optional) : A tensor to use to store the result.

	blocksize (`int`, optional) : The size of the blocks. Defaults to 64. Valid values are 32, 64, 128, 256, 512, 1024, 2048, and 4096.

	compress_statistics (`bool`, optional) : Whether to additionally quantize the absmax values. Defaults to False.

	quant_type (`str`, optional) : The data type to use: `nf4` or `fp4`. Defaults to `fp4`.

	quant_storage (`torch.dtype`, optional) : The dtype of the tensor used to store the result. Defaults to `torch.uint8`.

	Returns:

	`Tuple[`torch.Tensor`, `QuantState`]`

	A tuple containing the quantization results.
	- `torch.Tensor`: The quantized tensor with packed 4-bit values.
	- `QuantState`: The state object used to undo the quantization.

	#### bitsandbytes.functional.quantize_fp4[[bitsandbytes.functional.quantize_fp4]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L850)

	#### bitsandbytes.functional.quantize_nf4[[bitsandbytes.functional.quantize_nf4]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L861)

	#### bitsandbytes.functional.QuantState[[bitsandbytes.functional.QuantState]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L419)

	container for quantization state components to work with Params4bit and similar classes

	as_dictbitsandbytes.functional.QuantState.as_dicthttps://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L544[{"name": "packed", "val": ": bool = False"}]

	returns dict of tensors and strings to use in serialization via _save_to_state_dict()
	param: packed -- returns dict[str, torch.Tensor] for state_dict fit for safetensors saving
	#### from_dict[[bitsandbytes.functional.QuantState.from_dict]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L492)

	unpacks components of state_dict into QuantState
	where necessary, convert into strings, torch.dtype, ints, etc.

	qs_dict: based on state_dict, with only relevant keys, striped of prefixes.

	item with key `quant_state.bitsandbytes__[nf4/fp4]` may contain minor and non-tensor quant state items.

	## Dynamic 8-bit Quantization[[bitsandbytes.functional.dequantize_blockwise]]

	Primitives used in the 8-bit optimizer quantization.

	For more details see [8-Bit Approximations for Parallelism in Deep Learning](https://arxiv.org/abs/1511.04561)

	#### bitsandbytes.functional.dequantize_blockwise[[bitsandbytes.functional.dequantize_blockwise]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L683)

	Dequantize a tensor in blocks of values.

	The input tensor is dequantized by dividing it into blocks of `blocksize` values.
	The the absolute maximum value within these blocks is used for scaling
	the non-linear dequantization.

	Parameters:

	A (`torch.Tensor`) : The quantized input tensor.

	quant_state (`QuantState`, optional) : The quantization state as returned by `quantize_blockwise`. Required if `absmax` is not provided.

	absmax (`torch.Tensor`, optional) : A tensor containing the scaling values. Required if `quant_state` is not provided and ignored otherwise.

	code (`torch.Tensor`, optional) : A mapping describing the low-bit data type. Defaults to a signed 8-bit dynamic type. For more details, see (8-Bit Approximations for Parallelism in Deep Learning)[https://arxiv.org/abs/1511.04561]. Ignored when `quant_state` is provided.

	out (`torch.Tensor`, optional) : A tensor to use to store the result.

	blocksize (`int`, optional) : The size of the blocks. Defaults to 4096. Valid values are 64, 128, 256, 512, 1024, 2048, and 4096. Ignored when `quant_state` is provided.

	Returns:

	``torch.Tensor``

	The dequantized tensor. The datatype is indicated by `quant_state.dtype` and defaults to `torch.float32`.

	#### bitsandbytes.functional.quantize_blockwise[[bitsandbytes.functional.quantize_blockwise]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L612)

	Quantize a tensor in blocks of values.

	The input tensor is quantized by dividing it into blocks of `blocksize` values.
	The the absolute maximum value within these blocks is calculated for scaling
	the non-linear quantization.

	Parameters:

	A (`torch.Tensor`) : The input tensor. Supports `float16`, `bfloat16`, or `float32` datatypes.

	code (`torch.Tensor`, optional) : A mapping describing the low-bit data type. Defaults to a signed 8-bit dynamic type. For more details, see (8-Bit Approximations for Parallelism in Deep Learning)[https://arxiv.org/abs/1511.04561].

	absmax (`torch.Tensor`, optional) : A tensor to use to store the absmax values.

	out (`torch.Tensor`, optional) : A tensor to use to store the result.

	blocksize (`int`, optional) : The size of the blocks. Defaults to 4096. Valid values are 64, 128, 256, 512, 1024, 2048, and 4096.

	nested (`bool`, optional) : Whether to additionally quantize the absmax values. Defaults to False.

	Returns:

	``Tuple[torch.Tensor, QuantState]``

	A tuple containing the quantization results.
	- `torch.Tensor`: The quantized tensor.
	- `QuantState`: The state object used to undo the quantization.

	## Utility[[bitsandbytes.functional.get_ptr]]
	#### bitsandbytes.functional.get_ptr[[bitsandbytes.functional.get_ptr]]

	[Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L404)

	Gets the memory address of the first element of a tenso

	Parameters:

	A (`Optional[Tensor]`) : A PyTorch tensor.

	Returns:

	``Optional[ct.c_void_p]``

	A pointer to the underlying tensor data.

Xet Storage Details

Size:: 11.9 kB
Xet hash:: 3f39505b785e6dbc2ee6835467947dd4e8302ead12a42411d257533819c414d9

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.