Buckets:

rtrm's picture
|
download
raw
8.58 kB

4-bit quantization

QLoRA is a finetuning method that quantizes a model to 4-bits and adds a set of low-rank adaptation (LoRA) weights to the model and tuning them through the quantized weights. This method also introduces a new data type, 4-bit NormalFloat (LinearNF4) in addition to the standard Float4 data type (LinearFP4). LinearNF4 is a quantization data type for normally distributed data and can improve performance.

Linear4bit[[bitsandbytes.nn.Linear4bit]]

class bitsandbytes.nn.Linear4bitbitsandbytes.nn.Linear4bithttps://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L413[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_type", "val": " = 'fp4'"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]

This class is the base module for the 4-bit quantization algorithm presented in QLoRA. QLoRA 4-bit linear layers uses blockwise k-bit quantization under the hood, with the possibility of selecting various compute datatypes such as FP4 and NF4.

In order to quantize a linear layer one should first load the original fp16 / bf16 weights into the Linear4bit module, then call quantized_module.to("cuda") to quantize the fp16 / bf16 weights.

Example:

import torch
import torch.nn as nn

import bitsandbytes as bnb
from bnb.nn import Linear4bit

fp16_model = nn.Sequential(
    nn.Linear(64, 64),
    nn.Linear(64, 64)
)

quantized_model = nn.Sequential(
    Linear4bit(64, 64),
    Linear4bit(64, 64)
)

quantized_model.load_state_dict(fp16_model.state_dict())
quantized_model = quantized_model.to(0) # Quantization happens here

initbitsandbytes.nn.Linear4bit.inithttps://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L446[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_type", "val": " = 'fp4'"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]- input_features (str) -- Number of input features of the linear layer.

  • output_features (str) -- Number of output features of the linear layer.
  • bias (bool, defaults to True) -- Whether the linear class uses the bias term as well.0

Initialize Linear4bit class.

LinearFP4[[bitsandbytes.nn.LinearFP4]]

class bitsandbytes.nn.LinearFP4bitsandbytes.nn.LinearFP4https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L535[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]

Implements the FP4 data type.

initbitsandbytes.nn.LinearFP4.inithttps://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L540[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]- input_features (str) -- Number of input features of the linear layer.

  • output_features (str) -- Number of output features of the linear layer.
  • bias (bool, defaults to True) -- Whether the linear class uses the bias term as well.0

LinearNF4[[bitsandbytes.nn.LinearNF4]]

class bitsandbytes.nn.LinearNF4bitsandbytes.nn.LinearNF4https://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L571[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}] Implements the NF4 data type.

Constructs a quantization data type where each bin has equal area under a standard normal distribution N(0, 1) that is normalized into the range [-1, 1].

For more information read the paper: QLoRA: Efficient Finetuning of Quantized LLMs (https://arxiv.org/abs/2305.14314)

Implementation of the NF4 data type in bitsandbytes can be found in the create_normal_map function in the functional.py file: https://github.com/TimDettmers/bitsandbytes/blob/main/bitsandbytes/functional.py#L236.

initbitsandbytes.nn.LinearNF4.inithttps://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L583[{"name": "input_features", "val": ""}, {"name": "output_features", "val": ""}, {"name": "bias", "val": " = True"}, {"name": "compute_dtype", "val": " = None"}, {"name": "compress_statistics", "val": " = True"}, {"name": "quant_storage", "val": " = torch.uint8"}, {"name": "device", "val": " = None"}]- input_features (str) -- Number of input features of the linear layer.

  • output_features (str) -- Number of output features of the linear layer.
  • bias (bool, defaults to True) -- Whether the linear class uses the bias term as well.0

Params4bit[[bitsandbytes.nn.Params4bit]]

class bitsandbytes.nn.Params4bitbitsandbytes.nn.Params4bithttps://github.com/bitsandbytes-foundation/bitsandbytes/blob/v0.48.2/bitsandbytes/nn/modules.py#L207[{"name": "data", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "requires_grad", "val": " = False"}, {"name": "quant_state", "val": ": typing.Optional[bitsandbytes.functional.QuantState] = None"}, {"name": "blocksize", "val": ": typing.Optional[int] = None"}, {"name": "compress_statistics", "val": ": bool = True"}, {"name": "quant_type", "val": ": str = 'fp4'"}, {"name": "quant_storage", "val": ": dtype = torch.uint8"}, {"name": "module", "val": ": typing.Optional[ForwardRef('Linear4bit')] = None"}, {"name": "bnb_quantized", "val": ": bool = False"}]

initbitsandbytes.nn.Params4bit.init[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}] Initialize self. See help(type(self)) for accurate signature.

Xet Storage Details

Size:
8.58 kB
·
Xet hash:
35e919ab88ec5085ad8b76dc2715a4e1f40c7c51af8f822637948a9798a70e63

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.