Buckets:

rtrm's picture
|
download
raw
784 Bytes
# SpQR
The [SpQR](https://hf.co/papers/2306.03078) quantization algorithm involves a 16x16 tiled bi-level group 3-bit quantization structure with sparse outliers.
> [!TIP]
> To quantize a model with SpQR, refer to the [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR) repository.
Load a SpQR-quantized model with [from_pretrained()](/docs/transformers/pr_26617/en/main_classes/model#transformers.PreTrainedModel.from_pretrained).
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
quantized_model = AutoModelForCausalLM.from_pretrained(
"elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf",
dtype=torch.half,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf")
```

Xet Storage Details

Size:
784 Bytes
·
Xet hash:
51a453eee7e1a9d0925e9944dbde379e222971f7752612d8d1d180d33325339d

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.