File size: 2,524 Bytes
ef35e77 570175a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 | ---
tags:
- quantized
- quanto
- int8
- automatic-quantization
base_model: Sambhavnoobcoder/gpt2-test-quantization
license: apache-2.0
---
# gpt2-test-quantization - Quanto int8
This is an **automatically quantized** version of [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization) using [Quanto](https://github.com/huggingface/optimum-quanto) int8 quantization.
## โก Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
"Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8")
# Generate text
inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
```
## ๐ง Quantization Details
- **Method:** [Quanto](https://github.com/huggingface/optimum-quanto) (HuggingFace native)
- **Precision:** int8 (8-bit integer weights)
- **Quality:** 99%+ retention vs FP16
- **Memory:** ~2x smaller than original
- **Speed:** 2-4x faster inference
## ๐ Performance
| Metric | Value |
|--------|-------|
| Memory Reduction | ~50% |
| Quality Retention | 99%+ |
| Inference Speed | 2-4x faster |
## ๐ค Automatic Quantization
This model was automatically quantized by the [Auto-Quantization Service](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp).
**Want your models automatically quantized?**
1. Set up a webhook in your [HuggingFace settings](https://huggingface.co/settings/webhooks)
2. Point to: `https://Sambhavnoobcoder-quantization-mvp.hf.space/webhook`
3. Upload a model - it will be automatically quantized!
## ๐ Learn More
- **Original Model:** [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization)
- **Quantization Method:** [Quanto Documentation](https://huggingface.co/docs/optimum/quanto/index)
- **Service Code:** [GitHub Repository](https://github.com/Sambhavnoobcoder/auto-quantization-mvp)
## ๐ Citation
```bibtex
@software{quanto_quantization,
title = {Quanto: PyTorch Quantization Toolkit},
author = {HuggingFace Team},
year = {2024},
url = {https://github.com/huggingface/optimum-quanto}
}
```
---
*Generated on 2026-01-10 21:37:02 by [Auto-Quantization MVP](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp)*
|