Gemma-2-quantized
Collection
Gemma-2-2B and Gemma-2-9B Quantized in low-bit
•
4 items
•
Updated
This repository contains a 4-bit quantized version of the Gemma-2-2B model. Designed to minimize memory consumption and speed up inference.
| Gemma-2-2b | Wiki | C4 | PIQA | ARC-E | ARC-C | HellaSwag | Wino | Avg. |
|---|---|---|---|---|---|---|---|---|
| 0-shot | 0-shot | 0-shot | 0-shot | 25-shot | 0-shot | 0-shot | ||
| Unquantized | 8.76 | 12.54 | 78.40 | 80.18 | 50.58 | 54.98 | 68.90 | 66.66 |
| Int4 | 9.81 | 13.80 | 77.48 | 78.24 | 43.43 | 51.62 | 67.80 | 63.72 |
Benchmark scores are computed with lm-evaluation-harness.
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("StoyanGanchev/gemma-2-2b-int4")
model = AutoModelForCausalLM.from_pretrained("StoyanGanchev/gemma-2-2b-int4")
All files could be accesed on repository