File size: 2,524 Bytes
ef35e77
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
570175a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
tags:
- quantized
- quanto
- int8
- automatic-quantization
base_model: Sambhavnoobcoder/gpt2-test-quantization
license: apache-2.0
---

# gpt2-test-quantization - Quanto int8

This is an **automatically quantized** version of [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization) using [Quanto](https://github.com/huggingface/optimum-quanto) int8 quantization.

## โšก Quick Start

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8")

# Generate text
inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
```

## ๐Ÿ”ง Quantization Details

- **Method:** [Quanto](https://github.com/huggingface/optimum-quanto) (HuggingFace native)
- **Precision:** int8 (8-bit integer weights)
- **Quality:** 99%+ retention vs FP16
- **Memory:** ~2x smaller than original
- **Speed:** 2-4x faster inference



## ๐Ÿ“ˆ Performance

| Metric | Value |
|--------|-------|
| Memory Reduction | ~50% |
| Quality Retention | 99%+ |
| Inference Speed | 2-4x faster |

## ๐Ÿค– Automatic Quantization

This model was automatically quantized by the [Auto-Quantization Service](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp).

**Want your models automatically quantized?**

1. Set up a webhook in your [HuggingFace settings](https://huggingface.co/settings/webhooks)
2. Point to: `https://Sambhavnoobcoder-quantization-mvp.hf.space/webhook`
3. Upload a model - it will be automatically quantized!

## ๐Ÿ“š Learn More

- **Original Model:** [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization)
- **Quantization Method:** [Quanto Documentation](https://huggingface.co/docs/optimum/quanto/index)
- **Service Code:** [GitHub Repository](https://github.com/Sambhavnoobcoder/auto-quantization-mvp)

## ๐Ÿ“ Citation

```bibtex
@software{quanto_quantization,
  title = {Quanto: PyTorch Quantization Toolkit},
  author = {HuggingFace Team},
  year = {2024},
  url = {https://github.com/huggingface/optimum-quanto}
}
```

---

*Generated on 2026-01-10 21:37:02 by [Auto-Quantization MVP](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp)*