Buckets:

hf-doc-build
/

doc-dev

about 2 months ago

1.06 kB

	# SpQR

	The [SpQR](https://hf.co/papers/2306.03078) quantization algorithm involves a 16x16 tiled bi-level group 3-bit quantization structure with sparse outliers.

	<div class="flex justify-center">
	<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/spqr-diagram.png">
	</div>

	> [!TIP]
	> To quantize a model with SpQR, refer to the [Vahe1994/SpQR](https://github.com/Vahe1994/SpQR) repository.

	Load a SpQR-quantized model with [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained).

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	quantized_model = AutoModelForCausalLM.from_pretrained(
	"elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf",
	dtype=torch.half,
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("elvircrn/Llama-2-7b-SPQR-3Bit-16x16-red_pajama-hf")
	```


	<EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/en/quantization/spqr.md" />

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.