---
tags:
- quantized
- quanto
- int8
- automatic-quantization
base_model: Sambhavnoobcoder/gpt2-test-quantization
license: apache-2.0
---

# gpt2-test-quantization - Quanto int8

This is an **automatically quantized** version of [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization) using [Quanto](https://github.com/huggingface/optimum-quanto) int8 quantization.

## ⚡ Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8")

# Generate text
inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
```

## 🔧 Quantization Details

- **Method:** [Quanto](https://github.com/huggingface/optimum-quanto) (HuggingFace native)
- **Precision:** int8 (8-bit integer weights)
- **Quality:** 99%+ retention vs FP16
- **Memory:** ~2x smaller than original
- **Speed:** 2-4x faster inference


## 📈 Performance

| Metric | Value |
|--------|-------|
| Memory Reduction | ~50% |
| Quality Retention | 99%+ |
| Inference Speed | 2-4x faster |

## 🤖 Automatic Quantization

This model was automatically quantized by the [Auto-Quantization Service](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp).

**Want your models automatically quantized?**

1. Set up a webhook in your [HuggingFace settings](https://huggingface.co/settings/webhooks)
2. Point to: `https://Sambhavnoobcoder-quantization-mvp.hf.space/webhook`
3. Upload a model - it will be automatically quantized!

## 📚 Learn More

- **Original Model:** [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization)
- **Quantization Method:** [Quanto Documentation](https://huggingface.co/docs/optimum/quanto/index)
- **Service Code:** [GitHub Repository](https://github.com/Sambhavnoobcoder/auto-quantization-mvp)

## 📝 Citation

```bibtex
@software{quanto_quantization,
  title = {Quanto: PyTorch Quantization Toolkit},
  author = {HuggingFace Team},
  year = {2024},
  url = {https://github.com/huggingface/optimum-quanto}
}
```

---

*Generated on 2026-01-10 21:37:02 by [Auto-Quantization MVP](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp)*