Buckets:
| # Overview | |
| The `bitsandbytes.functional` API provides the low-level building blocks for the library's features. | |
| ## When to Use `bitsandbytes.functional` | |
| * When you need direct control over quantized operations and their parameters. | |
| * To build custom layers or operations leveraging low-bit arithmetic. | |
| * To integrate with other ecosystem tooling. | |
| * For experimental or research purposes requiring non-standard quantization or performance optimizations. | |
| ## LLM.int8()[[bitsandbytes.functional.int8_linear_matmul]] | |
| #### bitsandbytes.functional.int8_linear_matmul[[bitsandbytes.functional.int8_linear_matmul]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1503) | |
| Performs an 8-bit integer matrix multiplication. | |
| A linear transformation is applied such that `out = A @ B.T`. When possible, integer tensor core hardware is | |
| utilized to accelerate the operation. | |
| **Parameters:** | |
| A (`torch.Tensor`) : The first matrix operand with the data type `torch.int8`. | |
| B (`torch.Tensor`) : The second matrix operand with the data type `torch.int8`. | |
| out (`torch.Tensor`, *optional*) : A pre-allocated tensor used to store the result. | |
| dtype (`torch.dtype`, *optional*) : The expected data type of the output. Defaults to `torch.int32`. | |
| **Returns:** | |
| ``torch.Tensor`` | |
| The result of the operation. | |
| #### bitsandbytes.functional.int8_mm_dequant[[bitsandbytes.functional.int8_mm_dequant]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1529) | |
| Performs dequantization on the result of a quantized int8 matrix multiplication. | |
| **Parameters:** | |
| A (`torch.Tensor` with dtype `torch.int32`) : The result of a quantized int8 matrix multiplication. | |
| row_stats (`torch.Tensor`) : The row-wise quantization statistics for the lhs operand of the matrix multiplication. | |
| col_stats (`torch.Tensor`) : The column-wise quantization statistics for the rhs operand of the matrix multiplication. | |
| out (`torch.Tensor`, *optional*) : A pre-allocated tensor to store the output of the operation. | |
| bias (`torch.Tensor`, *optional*) : An optional bias vector to add to the result. | |
| **Returns:** | |
| ``torch.Tensor`` | |
| The dequantized result with an optional bias, with dtype `torch.float16`. | |
| #### bitsandbytes.functional.int8_vectorwise_dequant[[bitsandbytes.functional.int8_vectorwise_dequant]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1608) | |
| Dequantizes a tensor with dtype `torch.int8` to `torch.float32`. | |
| **Parameters:** | |
| A (`torch.Tensor` with dtype `torch.int8`) : The quantized int8 tensor. | |
| stats (`torch.Tensor` with dtype `torch.float32`) : The row-wise quantization statistics. | |
| **Returns:** | |
| ``torch.Tensor` with dtype `torch.float32`` | |
| The dequantized tensor. | |
| #### bitsandbytes.functional.int8_vectorwise_quant[[bitsandbytes.functional.int8_vectorwise_quant]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1622) | |
| Quantizes a tensor with dtype `torch.float16` to `torch.int8` in accordance to the `LLM.int8()` algorithm. | |
| For more information, see the [LLM.int8() paper](https://arxiv.org/abs/2208.07339). | |
| **Parameters:** | |
| A (`torch.Tensor` with dtype `torch.float16`) : The input tensor. | |
| threshold (`float`, *optional*) : An optional threshold for sparse decomposition of outlier features. No outliers are held back when 0.0. Defaults to 0.0. | |
| **Returns:** | |
| ``Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]`` | |
| A tuple containing the quantized tensor and relevant statistics. | |
| - `torch.Tensor` with dtype `torch.int8`: The quantized data. | |
| - `torch.Tensor` with dtype `torch.float32`: The quantization scales. | |
| - `torch.Tensor` with dtype `torch.int32`, *optional*: A list of column indices which contain outlier features. | |
| ## 4-bit[[bitsandbytes.functional.dequantize_4bit]] | |
| #### bitsandbytes.functional.dequantize_4bit[[bitsandbytes.functional.dequantize_4bit]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L973) | |
| Dequantizes a packed 4-bit quantized tensor. | |
| The input tensor is dequantized by dividing it into blocks of `blocksize` values. | |
| The absolute maximum value within these blocks is used for scaling | |
| the non-linear dequantization. | |
| **Parameters:** | |
| A (`torch.Tensor`) : The quantized input tensor. | |
| quant_state (`QuantState`, *optional*) : The quantization state as returned by `quantize_4bit`. Required if `absmax` is not provided. | |
| absmax (`torch.Tensor`, *optional*) : A tensor containing the scaling values. Required if `quant_state` is not provided and ignored otherwise. | |
| out (`torch.Tensor`, *optional*) : A tensor to use to store the result. | |
| blocksize (`int`, *optional*) : The size of the blocks. Defaults to 64. Valid values are 32, 64, 128, 256, 512, 1024, 2048, and 4096. | |
| quant_type (`str`, *optional*) : The data type to use: `nf4` or `fp4`. Defaults to `fp4`. | |
| **Returns:** | |
| ``torch.Tensor`` | |
| The dequantized tensor. | |
| #### bitsandbytes.functional.dequantize_fp4[[bitsandbytes.functional.dequantize_fp4]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L953) | |
| #### bitsandbytes.functional.dequantize_nf4[[bitsandbytes.functional.dequantize_nf4]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L963) | |
| #### bitsandbytes.functional.gemv_4bit[[bitsandbytes.functional.gemv_4bit]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L1269) | |
| #### bitsandbytes.functional.quantize_4bit[[bitsandbytes.functional.quantize_4bit]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L872) | |
| Quantize tensor A in blocks of 4-bit values. | |
| Quantizes tensor A by dividing it into blocks which are independently quantized. | |
| **Parameters:** | |
| A (`torch.Tensor`) : The input tensor. Supports `float16`, `bfloat16`, or `float32` datatypes. | |
| absmax (`torch.Tensor`, *optional*) : A tensor to use to store the absmax values. | |
| out (`torch.Tensor`, *optional*) : A tensor to use to store the result. | |
| blocksize (`int`, *optional*) : The size of the blocks. Defaults to 64. Valid values are 32, 64, 128, 256, 512, 1024, 2048, and 4096. | |
| compress_statistics (`bool`, *optional*) : Whether to additionally quantize the absmax values. Defaults to False. | |
| quant_type (`str`, *optional*) : The data type to use: `nf4` or `fp4`. Defaults to `fp4`. | |
| quant_storage (`torch.dtype`, *optional*) : The dtype of the tensor used to store the result. Defaults to `torch.uint8`. | |
| **Returns:** | |
| `Tuple[`torch.Tensor`, `QuantState`]` | |
| A tuple containing the quantization results. | |
| - `torch.Tensor`: The quantized tensor with packed 4-bit values. | |
| - `QuantState`: The state object used to undo the quantization. | |
| #### bitsandbytes.functional.quantize_fp4[[bitsandbytes.functional.quantize_fp4]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L850) | |
| #### bitsandbytes.functional.quantize_nf4[[bitsandbytes.functional.quantize_nf4]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L861) | |
| #### bitsandbytes.functional.QuantState[[bitsandbytes.functional.QuantState]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L419) | |
| container for quantization state components to work with Params4bit and similar classes | |
| as_dictbitsandbytes.functional.QuantState.as_dicthttps://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L544[{"name": "packed", "val": ": bool = False"}] | |
| returns dict of tensors and strings to use in serialization via _save_to_state_dict() | |
| param: packed -- returns dict[str, torch.Tensor] for state_dict fit for safetensors saving | |
| #### from_dict[[bitsandbytes.functional.QuantState.from_dict]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L492) | |
| unpacks components of state_dict into QuantState | |
| where necessary, convert into strings, torch.dtype, ints, etc. | |
| qs_dict: based on state_dict, with only relevant keys, striped of prefixes. | |
| item with key `quant_state.bitsandbytes__[nf4/fp4]` may contain minor and non-tensor quant state items. | |
| ## Dynamic 8-bit Quantization[[bitsandbytes.functional.dequantize_blockwise]] | |
| Primitives used in the 8-bit optimizer quantization. | |
| For more details see [8-Bit Approximations for Parallelism in Deep Learning](https://arxiv.org/abs/1511.04561) | |
| #### bitsandbytes.functional.dequantize_blockwise[[bitsandbytes.functional.dequantize_blockwise]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L683) | |
| Dequantize a tensor in blocks of values. | |
| The input tensor is dequantized by dividing it into blocks of `blocksize` values. | |
| The the absolute maximum value within these blocks is used for scaling | |
| the non-linear dequantization. | |
| **Parameters:** | |
| A (`torch.Tensor`) : The quantized input tensor. | |
| quant_state (`QuantState`, *optional*) : The quantization state as returned by `quantize_blockwise`. Required if `absmax` is not provided. | |
| absmax (`torch.Tensor`, *optional*) : A tensor containing the scaling values. Required if `quant_state` is not provided and ignored otherwise. | |
| code (`torch.Tensor`, *optional*) : A mapping describing the low-bit data type. Defaults to a signed 8-bit dynamic type. For more details, see (8-Bit Approximations for Parallelism in Deep Learning)[https://arxiv.org/abs/1511.04561]. Ignored when `quant_state` is provided. | |
| out (`torch.Tensor`, *optional*) : A tensor to use to store the result. | |
| blocksize (`int`, *optional*) : The size of the blocks. Defaults to 4096. Valid values are 64, 128, 256, 512, 1024, 2048, and 4096. Ignored when `quant_state` is provided. | |
| **Returns:** | |
| ``torch.Tensor`` | |
| The dequantized tensor. The datatype is indicated by `quant_state.dtype` and defaults to `torch.float32`. | |
| #### bitsandbytes.functional.quantize_blockwise[[bitsandbytes.functional.quantize_blockwise]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L612) | |
| Quantize a tensor in blocks of values. | |
| The input tensor is quantized by dividing it into blocks of `blocksize` values. | |
| The the absolute maximum value within these blocks is calculated for scaling | |
| the non-linear quantization. | |
| **Parameters:** | |
| A (`torch.Tensor`) : The input tensor. Supports `float16`, `bfloat16`, or `float32` datatypes. | |
| code (`torch.Tensor`, *optional*) : A mapping describing the low-bit data type. Defaults to a signed 8-bit dynamic type. For more details, see (8-Bit Approximations for Parallelism in Deep Learning)[https://arxiv.org/abs/1511.04561]. | |
| absmax (`torch.Tensor`, *optional*) : A tensor to use to store the absmax values. | |
| out (`torch.Tensor`, *optional*) : A tensor to use to store the result. | |
| blocksize (`int`, *optional*) : The size of the blocks. Defaults to 4096. Valid values are 64, 128, 256, 512, 1024, 2048, and 4096. | |
| nested (`bool`, *optional*) : Whether to additionally quantize the absmax values. Defaults to False. | |
| **Returns:** | |
| ``Tuple[torch.Tensor, QuantState]`` | |
| A tuple containing the quantization results. | |
| - `torch.Tensor`: The quantized tensor. | |
| - `QuantState`: The state object used to undo the quantization. | |
| ## Utility[[bitsandbytes.functional.get_ptr]] | |
| #### bitsandbytes.functional.get_ptr[[bitsandbytes.functional.get_ptr]] | |
| [Source](https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L404) | |
| Gets the memory address of the first element of a tenso | |
| **Parameters:** | |
| A (`Optional[Tensor]`) : A PyTorch tensor. | |
| **Returns:** | |
| ``Optional[ct.c_void_p]`` | |
| A pointer to the underlying tensor data. | |
Xet Storage Details
- Size:
- 11.9 kB
- Xet hash:
- 3f39505b785e6dbc2ee6835467947dd4e8302ead12a42411d257533819c414d9
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.