Automatic quantization of Sambhavnoobcoder/gpt2-test-quantization

570175a verified about 1 month ago

2.52 kB

	---
	tags:
	- quantized
	- quanto
	- int8
	- automatic-quantization
	base_model: Sambhavnoobcoder/gpt2-test-quantization
	license: apache-2.0
	---

	# gpt2-test-quantization - Quanto int8

	This is an automatically quantized version of [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization) using [Quanto](https://github.com/huggingface/optimum-quanto) int8 quantization.

	## ⚡ Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load quantized model
	model = AutoModelForCausalLM.from_pretrained(
	"Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8",
	device_map="auto"
	)

	tokenizer = AutoTokenizer.from_pretrained("Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8")

	# Generate text
	inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_length=50)
	print(tokenizer.decode(outputs[0]))
	```

	## 🔧 Quantization Details

	- Method: [Quanto](https://github.com/huggingface/optimum-quanto) (HuggingFace native)
	- Precision: int8 (8-bit integer weights)
	- Quality: 99%+ retention vs FP16
	- Memory: ~2x smaller than original
	- Speed: 2-4x faster inference



	## 📈 Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Memory Reduction \| ~50% \|
	\| Quality Retention \| 99%+ \|
	\| Inference Speed \| 2-4x faster \|

	## 🤖 Automatic Quantization

	This model was automatically quantized by the [Auto-Quantization Service](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp).

	Want your models automatically quantized?

	1. Set up a webhook in your [HuggingFace settings](https://huggingface.co/settings/webhooks)
	2. Point to: `https://Sambhavnoobcoder-quantization-mvp.hf.space/webhook`
	3. Upload a model - it will be automatically quantized!

	## 📚 Learn More

	- Original Model: [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization)
	- Quantization Method: [Quanto Documentation](https://huggingface.co/docs/optimum/quanto/index)
	- Service Code: [GitHub Repository](https://github.com/Sambhavnoobcoder/auto-quantization-mvp)

	## 📝 Citation

	```bibtex
	@software{quanto_quantization,
	title = {Quanto: PyTorch Quantization Toolkit},
	author = {HuggingFace Team},
	year = {2024},
	url = {https://github.com/huggingface/optimum-quanto}
	}
	```

	---

	Generated on 2026-01-10 21:37:02 by [Auto-Quantization MVP](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp)