Buckets:
| # 4-bit quantization | |
| [QLoRA](https://hf.co/papers/2305.14314) is a finetuning method that quantizes a model to 4-bits and adds a set of low-rank adaptation (LoRA) weights to the model and tuning them through the quantized weights. This method also introduces a new data type, 4-bit NormalFloat (`LinearNF4`) in addition to the standard Float4 data type (`LinearFP4`). `LinearNF4` is a quantization data type for normally distributed data and can improve performance. | |
| ## Linear4bit[[bitsandbytes.nn.Linear4bit]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class bitsandbytes.nn.Linear4bit</name><anchor>bitsandbytes.nn.Linear4bit</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L413</source><parameters>[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_type", "val": " = 'fp4'"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]</parameters></docstring> | |
| This class is the base module for the 4-bit quantization algorithm presented in [QLoRA](https://arxiv.org/abs/2305.14314). | |
| QLoRA 4-bit linear layers uses blockwise k-bit quantization under the hood, with the possibility of selecting various | |
| compute datatypes such as FP4 and NF4. | |
| In order to quantize a linear layer one should first load the original fp16 / bf16 weights into | |
| the Linear4bit module, then call `quantized_module.to("cuda")` to quantize the fp16 / bf16 weights. | |
| <ExampleCodeBlock anchor="bitsandbytes.nn.Linear4bit.example"> | |
| Example: | |
| ```python | |
| import torch | |
| import torch.nn as nn | |
| import bitsandbytes as bnb | |
| from bnb.nn import Linear4bit | |
| fp16_model = nn.Sequential( | |
| nn.Linear(64, 64), | |
| nn.Linear(64, 64) | |
| ) | |
| quantized_model = nn.Sequential( | |
| Linear4bit(64, 64), | |
| Linear4bit(64, 64) | |
| ) | |
| quantized_model.load_state_dict(fp16_model.state_dict()) | |
| quantized_model = quantized_model.to(0) # Quantization happens here | |
| ``` | |
| </ExampleCodeBlock> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__init__</name><anchor>bitsandbytes.nn.Linear4bit.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L446</source><parameters>[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_type", "val": " = 'fp4'"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]</parameters><paramsdesc>- **input_features** (`str`) -- | |
| Number of input features of the linear layer. | |
| - **output_features** (`str`) -- | |
| Number of output features of the linear layer. | |
| - **bias** (`bool`, defaults to `True`) -- | |
| Whether the linear class uses the bias term as well.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Initialize Linear4bit class. | |
| </div></div> | |
| ## LinearFP4[[bitsandbytes.nn.LinearFP4]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class bitsandbytes.nn.LinearFP4</name><anchor>bitsandbytes.nn.LinearFP4</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L535</source><parameters>[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]</parameters></docstring> | |
| Implements the FP4 data type. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__init__</name><anchor>bitsandbytes.nn.LinearFP4.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L540</source><parameters>[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]</parameters><paramsdesc>- **input_features** (`str`) -- | |
| Number of input features of the linear layer. | |
| - **output_features** (`str`) -- | |
| Number of output features of the linear layer. | |
| - **bias** (`bool`, defaults to `True`) -- | |
| Whether the linear class uses the bias term as well.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| </div></div> | |
| ## LinearNF4[[bitsandbytes.nn.LinearNF4]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class bitsandbytes.nn.LinearNF4</name><anchor>bitsandbytes.nn.LinearNF4</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L571</source><parameters>[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]</parameters></docstring> | |
| Implements the NF4 data type. | |
| Constructs a quantization data type where each bin has equal area under a standard normal distribution N(0, 1) that | |
| is normalized into the range [-1, 1]. | |
| For more information read the paper: QLoRA: Efficient Finetuning of Quantized LLMs (https://arxiv.org/abs/2305.14314) | |
| Implementation of the NF4 data type in bitsandbytes can be found in the `create_normal_map` function in | |
| the `functional.py` file: https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/functional.py#L236. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__init__</name><anchor>bitsandbytes.nn.LinearNF4.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L583</source><parameters>[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]</parameters><paramsdesc>- **input_features** (`str`) -- | |
| Number of input features of the linear layer. | |
| - **output_features** (`str`) -- | |
| Number of output features of the linear layer. | |
| - **bias** (`bool`, defaults to `True`) -- | |
| Whether the linear class uses the bias term as well.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| </div></div> | |
| ## Params4bit[[bitsandbytes.nn.Params4bit]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class bitsandbytes.nn.Params4bit</name><anchor>bitsandbytes.nn.Params4bit</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L207</source><parameters>[{"name": "data", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "requires_grad", "val": " = False"}, {"name": "quant_state", "val": ": typing.Optional[bitsandbytes.functional.QuantState] = None"}, {"name": "blocksize", "val": ": typing.Optional[int] = None"}, {"name": "compress_statistics", "val": ": bool = True"}, {"name": "quant_type", "val": ": str = 'fp4'"}, {"name": "quant_storage", "val": ": dtype = torch.uint8"}, {"name": "module", "val": ": typing.Optional[ForwardRef('Linear4bit')] = None"}, {"name": "bnb_quantized", "val": ": bool = False"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__init__</name><anchor>bitsandbytes.nn.Params4bit.__init__</anchor><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| Initialize self. See help(type(self)) for accurate signature. | |
| </div></div> | |
| <EditOnGithub source="https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/reference/nn/linear4bit.mdx" /> |
Xet Storage Details
- Size:
- 8.58 kB
- Xet hash:
- 35e919ab88ec5085ad8b76dc2715a4e1f40c7c51af8f822637948a9798a70e63
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.