gpt2-test-quantization - Quanto int8
This is an automatically quantized version of Sambhavnoobcoder/gpt2-test-quantization using Quanto int8 quantization.
β‘ Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
"Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8")
# Generate text
inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
π§ Quantization Details
- Method: Quanto (HuggingFace native)
- Precision: int8 (8-bit integer weights)
- Quality: 99%+ retention vs FP16
- Memory: ~2x smaller than original
- Speed: 2-4x faster inference
π Performance
| Metric | Value |
|---|---|
| Memory Reduction | ~50% |
| Quality Retention | 99%+ |
| Inference Speed | 2-4x faster |
π€ Automatic Quantization
This model was automatically quantized by the Auto-Quantization Service.
Want your models automatically quantized?
- Set up a webhook in your HuggingFace settings
- Point to:
https://Sambhavnoobcoder-quantization-mvp.hf.space/webhook - Upload a model - it will be automatically quantized!
π Learn More
- Original Model: Sambhavnoobcoder/gpt2-test-quantization
- Quantization Method: Quanto Documentation
- Service Code: GitHub Repository
π Citation
@software{quanto_quantization,
title = {Quanto: PyTorch Quantization Toolkit},
author = {HuggingFace Team},
year = {2024},
url = {https://github.com/huggingface/optimum-quanto}
}
Generated on 2026-01-10 21:37:02 by Auto-Quantization MVP
- Downloads last month
- 32
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support