Sambhavnoobcoder's picture
Automatic quantization of Sambhavnoobcoder/gpt2-test-quantization
570175a verified
---
tags:
- quantized
- quanto
- int8
- automatic-quantization
base_model: Sambhavnoobcoder/gpt2-test-quantization
license: apache-2.0
---
# gpt2-test-quantization - Quanto int8
This is an **automatically quantized** version of [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization) using [Quanto](https://github.com/huggingface/optimum-quanto) int8 quantization.
## ⚑ Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
"Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8")
# Generate text
inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
```
## πŸ”§ Quantization Details
- **Method:** [Quanto](https://github.com/huggingface/optimum-quanto) (HuggingFace native)
- **Precision:** int8 (8-bit integer weights)
- **Quality:** 99%+ retention vs FP16
- **Memory:** ~2x smaller than original
- **Speed:** 2-4x faster inference
## πŸ“ˆ Performance
| Metric | Value |
|--------|-------|
| Memory Reduction | ~50% |
| Quality Retention | 99%+ |
| Inference Speed | 2-4x faster |
## πŸ€– Automatic Quantization
This model was automatically quantized by the [Auto-Quantization Service](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp).
**Want your models automatically quantized?**
1. Set up a webhook in your [HuggingFace settings](https://huggingface.co/settings/webhooks)
2. Point to: `https://Sambhavnoobcoder-quantization-mvp.hf.space/webhook`
3. Upload a model - it will be automatically quantized!
## πŸ“š Learn More
- **Original Model:** [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization)
- **Quantization Method:** [Quanto Documentation](https://huggingface.co/docs/optimum/quanto/index)
- **Service Code:** [GitHub Repository](https://github.com/Sambhavnoobcoder/auto-quantization-mvp)
## πŸ“ Citation
```bibtex
@software{quanto_quantization,
title = {Quanto: PyTorch Quantization Toolkit},
author = {HuggingFace Team},
year = {2024},
url = {https://github.com/huggingface/optimum-quanto}
}
```
---
*Generated on 2026-01-10 21:37:02 by [Auto-Quantization MVP](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp)*