Buckets:

rtrm's picture
|
download
raw
1.06 kB

SpQR

The SpQR quantization algorithm involves a 16x16 tiled bi-level group 3-bit quantization structure with sparse outliers.

To quantize a model with SpQR, refer to the Vahe1994/SpQR repository.

Load a SpQR-quantized model with from_pretrained().

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

quantized_model = AutoModelForCausalLM.from_pretrained(
    "elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf",
    dtype=torch.half,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf")

Xet Storage Details

Size:
1.06 kB
·
Xet hash:
c298d03f36488b21c28f1f3223120210ae2026f14ac3a03dfef0f4a9868dad34

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.