Gemma-2-quantized
Collection
Gemma-2-2B and Gemma-2-9B Quantized in low-bit
•
4 items
•
Updated
This repository contains a 4-bit quantized version of the Gemma-2-9B model. Designed to minimize memory consumption and speed up inference.
| Gemma-2-9b | Wiki | C4 | PIQA | ARC-E | ARC-C | HellaSwag | Wino | Avg. |
|---|---|---|---|---|---|---|---|---|
| 0-shot | 0-shot | 0-shot | 0-shot | 25-shot | 0-shot | 0-shot | ||
| Unquantized | 6.88 | 10.12 | 81.39 | 87.25 | 64.33 | 61.27 | 74.11 | 73.67 |
| NF4 | 7.05 | 11.04 | 81.45 | 86.78 | 64.62 | 60.87 | 74.51 | 73.65 |
Benchmark scores are computed with lm-evaluation-harness.
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("StoyanGanchev/gemma-2-9b-nf4")
model = AutoModelForCausalLM.from_pretrained("StoyanGanchev/gemma-2-9b-nf4")
All files could be accesed on repository