Buckets:

rtrm's picture
|
download
raw
4.46 kB
# LLM.int8()
[LLM.int8()](https://hf.co/papers/2208.07339) is a quantization method that aims to make large language model inference more accessible without significant degradation. Unlike naive 8-bit quantization, which can result in loss of critical information and accuracy, LLM.int8() dynamically adapts to ensure sensitive components of the computation retain higher precision when needed. The key is to extract the outliers from the inputs and weights and multiply them in 16-bit. All other values are multiplied in 8-bit before being dequantized back to 16-bits. The outputs from the 16-bit and 8-bit multiplication are combined to produce the final output.
[Further Resources](../../explanations/resources#llm-int8)
## Linear8bitLt[[bitsandbytes.nn.Linear8bitLt]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class bitsandbytes.nn.Linear8bitLt</name><anchor>bitsandbytes.nn.Linear8bitLt</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L912</source><parameters>[{"name": "input_features", "val": ": int"}, {"name": "output_features", "val": ": int"}, {"name": "bias", "val": " = True"}, {"name": "has_fp16_weights", "val": " = True"}, {"name": "threshold", "val": " = 0.0"}, {"name": "index", "val": " = None"}, {"name": "device", "val": " = None"}]</parameters></docstring>
This class is the base module for the [LLM.int8()](https://arxiv.org/abs/2208.07339) algorithm.
To read more about it, have a look at the paper.
In order to quantize a linear layer one should first load the original fp16 / bf16 weights into
the Linear8bitLt module, then call `int8_module.to("cuda")` to quantize the fp16 weights.
<ExampleCodeBlock anchor="bitsandbytes.nn.Linear8bitLt.example">
Example:
```python
import torch
import torch.nn as nn
import bitsandbytes as bnb
from bnb.nn import Linear8bitLt
fp16_model = nn.Sequential(
nn.Linear(64, 64),
nn.Linear(64, 64)
)
int8_model = nn.Sequential(
Linear8bitLt(64, 64, has_fp16_weights=False),
Linear8bitLt(64, 64, has_fp16_weights=False)
)
int8_model.load_state_dict(fp16_model.state_dict())
int8_model = int8_model.to(0) # Quantization happens here
```
</ExampleCodeBlock>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>__init__</name><anchor>bitsandbytes.nn.Linear8bitLt.__init__</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L944</source><parameters>[{"name": "input_features", "val": ": int"}, {"name": "output_features", "val": ": int"}, {"name": "bias", "val": " = True"}, {"name": "has_fp16_weights", "val": " = True"}, {"name": "threshold", "val": " = 0.0"}, {"name": "index", "val": " = None"}, {"name": "device", "val": " = None"}]</parameters><paramsdesc>- **input_features** (`int`) --
Number of input features of the linear layer.
- **output_features** (`int`) --
Number of output features of the linear layer.
- **bias** (`bool`, defaults to `True`) --
Whether the linear class uses the bias term as well.</paramsdesc><paramgroups>0</paramgroups></docstring>
Initialize Linear8bitLt class.
</div></div>
## Int8Params[[bitsandbytes.nn.Int8Params]]
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>class bitsandbytes.nn.Int8Params</name><anchor>bitsandbytes.nn.Int8Params</anchor><source>https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L614</source><parameters>[{"name": "data", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "requires_grad", "val": " = True"}, {"name": "has_fp16_weights", "val": " = False"}, {"name": "CB", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "SCB", "val": ": typing.Optional[torch.Tensor] = None"}]</parameters></docstring>
<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">
<docstring><name>__init__</name><anchor>bitsandbytes.nn.Int8Params.__init__</anchor><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring>
Initialize self. See help(type(self)) for accurate signature.
</div></div>
<EditOnGithub source="https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/docs/source/reference/nn/linear8bit.mdx" />

Xet Storage Details

Size:
4.46 kB
·
Xet hash:
a639752d47776c68676f6bad083b28690c5ab5a2036f7fd2ab5d0e4ce2140525

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.