gpt2-test-quantization - Quanto int8

This is an automatically quantized version of Sambhavnoobcoder/gpt2-test-quantization using Quanto int8 quantization.

⚑ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load quantized model
model = AutoModelForCausalLM.from_pretrained(
    "Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8",
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8")

# Generate text
inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

πŸ”§ Quantization Details

  • Method: Quanto (HuggingFace native)
  • Precision: int8 (8-bit integer weights)
  • Quality: 99%+ retention vs FP16
  • Memory: ~2x smaller than original
  • Speed: 2-4x faster inference

πŸ“ˆ Performance

Metric Value
Memory Reduction ~50%
Quality Retention 99%+
Inference Speed 2-4x faster

πŸ€– Automatic Quantization

This model was automatically quantized by the Auto-Quantization Service.

Want your models automatically quantized?

  1. Set up a webhook in your HuggingFace settings
  2. Point to: https://Sambhavnoobcoder-quantization-mvp.hf.space/webhook
  3. Upload a model - it will be automatically quantized!

πŸ“š Learn More

πŸ“ Citation

@software{quanto_quantization,
  title = {Quanto: PyTorch Quantization Toolkit},
  author = {HuggingFace Team},
  year = {2024},
  url = {https://github.com/huggingface/optimum-quanto}
}

Generated on 2026-01-10 21:37:02 by Auto-Quantization MVP

Downloads last month
32
Safetensors
Model size
0.2B params
Tensor type
F16
Β·
I8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Sambhavnoobcoder/gpt2-test-quantization-Quanto-int8

Finetuned
(1)
this model
Finetunes
1 model