You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Gemma-2-2B quantized in Int4

Description

This repository contains a 4-bit quantized version of the Gemma-2-2B model. Designed to minimize memory consumption and speed up inference.

Benchmark results

Gemma-2-2b	Wiki	C4	PIQA	ARC-E	ARC-C	HellaSwag	Wino	Avg.
	0-shot	0-shot	0-shot	0-shot	25-shot	0-shot	0-shot
Unquantized	8.76	12.54	78.40	80.18	50.58	54.98	68.90	66.66
Int4	9.81	13.80	77.48	78.24	43.43	51.62	67.80	63.72

Benchmark scores are computed with lm-evaluation-harness.

How to use:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("StoyanGanchev/gemma-2-2b-int4")
model = AutoModelForCausalLM.from_pretrained("StoyanGanchev/gemma-2-2b-int4")

All files could be accesed on repository

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

F32

F16

Collection including StoyanGanchev/gemma-2-2b-int4

Gemma-2-quantized

Collection

Gemma-2-2B and Gemma-2-9B Quantized in low-bit • 4 items • Updated Aug 2, 2024