Gemma-2-2B quantized in NormalFloat4

Description

This repository contains a 4-bit quantized version of the Gemma-2-2B model. Designed to minimize memory consumption and speed up inference.

Benchmark results

Gemma-2-2b Wiki C4 PIQA ARC-E ARC-C HellaSwag Wino Avg.
0-shot 0-shot 0-shot 0-shot 25-shot 0-shot 0-shot
Unquantized 8.76 12.54 78.40 80.18 50.58 54.98 68.90 66.66
NF4 9.15 13.05 78.02 78.96 49.15 53.53 69.14 65.76

Benchmark scores are computed with lm-evaluation-harness.

How to use:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("StoyanGanchev/gemma-2-2b-nf4")
model = AutoModelForCausalLM.from_pretrained("StoyanGanchev/gemma-2-2b-nf4")

All files could be accesed on repository

Downloads last month
-
Safetensors
Model size
3B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including StoyanGanchev/gemma-2-2b-nf4