--- tags: - quantized - quanto - int8 - automatic-quantization base_model: Sambhavnoobcoder/gpt2-test-quantization license: apache-2.0 --- # gpt2-test-quantization - Quanto int8 This is an **automatically quantized** version of [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization) using [Quanto](https://github.com/huggingface/optimum-quanto) int8 quantization. ## ⚡ Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer # Load quantized model model = AutoModelForCausalLM.from_pretrained( "Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8") # Generate text inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=50) print(tokenizer.decode(outputs[0])) ``` ## 🔧 Quantization Details - **Method:** [Quanto](https://github.com/huggingface/optimum-quanto) (HuggingFace native) - **Precision:** int8 (8-bit integer weights) - **Quality:** 99%+ retention vs FP16 - **Memory:** ~2x smaller than original - **Speed:** 2-4x faster inference ## 📈 Performance | Metric | Value | |--------|-------| | Memory Reduction | ~50% | | Quality Retention | 99%+ | | Inference Speed | 2-4x faster | ## 🤖 Automatic Quantization This model was automatically quantized by the [Auto-Quantization Service](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp). **Want your models automatically quantized?** 1. Set up a webhook in your [HuggingFace settings](https://huggingface.co/settings/webhooks) 2. Point to: `https://Sambhavnoobcoder-quantization-mvp.hf.space/webhook` 3. Upload a model - it will be automatically quantized! ## 📚 Learn More - **Original Model:** [Sambhavnoobcoder/gpt2-test-quantization](https://huggingface.co/Sambhavnoobcoder/gpt2-test-quantization) - **Quantization Method:** [Quanto Documentation](https://huggingface.co/docs/optimum/quanto/index) - **Service Code:** [GitHub Repository](https://github.com/Sambhavnoobcoder/auto-quantization-mvp) ## 📝 Citation ```bibtex @software{quanto_quantization, title = {Quanto: PyTorch Quantization Toolkit}, author = {HuggingFace Team}, year = {2024}, url = {https://github.com/huggingface/optimum-quanto} } ``` --- *Generated on 2026-01-10 21:37:02 by [Auto-Quantization MVP](https://huggingface.co/spaces/Sambhavnoobcoder/quantization-mvp)*