Buckets:
| # SpQR | |
| The [SpQR](https://hf.co/papers/2306.03078) quantization algorithm involves a 16x16 tiled bi-level group 3-bit quantization structure with sparse outliers. | |
| <div class="flex justify-center"> | |
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/spqr-diagram.png"> | |
| </div> | |
| > [!TIP] | |
| > To quantize a model with SpQR, refer to the [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR) repository. | |
| Load a SpQR-quantized model with [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained). | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| quantized_model = AutoModelForCausalLM.from_pretrained( | |
| "elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf", | |
| dtype=torch.half, | |
| device_map="auto" | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained("elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf") | |
| ``` | |
| <EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/en/quantization/spqr.md" /> |
Xet Storage Details
- Size:
- 1.06 kB
- Xet hash:
- a3466c6763cd301cf37b5d479b1c5a0b776bb941a3bab3356267cfa28fcc87c3
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.