You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Gemma-2-2B quantized in Int4

Description

This repository contains a 4-bit quantized version of the Gemma-2-2B model. Designed to minimize memory consumption and speed up inference.

Benchmark results

Gemma-2-2b Wiki C4 PIQA ARC-E ARC-C HellaSwag Wino Avg.
0-shot 0-shot 0-shot 0-shot 25-shot 0-shot 0-shot
Unquantized 8.76 12.54 78.40 80.18 50.58 54.98 68.90 66.66
Int4 9.81 13.80 77.48 78.24 43.43 51.62 67.80 63.72

Benchmark scores are computed with lm-evaluation-harness.

How to use:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("StoyanGanchev/gemma-2-2b-int4")
model = AutoModelForCausalLM.from_pretrained("StoyanGanchev/gemma-2-2b-int4")

All files could be accesed on repository

Downloads last month
-
Safetensors
Model size
3B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including StoyanGanchev/gemma-2-2b-int4