AWQ-quantized package (W4G128) of google/gemma-2-2b. Support for Gemma2 in the codebase of AutoAWQ is proposed in the following pull request. To use the model, follow the AutoAWQ examples with the source from #562.

Evaluation
WikiText-2 PPL: 11.05
C4 PPL: 12.99

Loading

model_path = "radi-cho/gemma-2-2b-AWQ"

# With transformers
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda:0")

# With transformers (fused)
from transformers import AutoModelForCausalLM, AwqConfig
quantization_config = AwqConfig(bits=4, fuse_max_seq_len=512, do_fuse=True)
model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=quantization_config).to(0)

# With AutoAWQ
from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized(model_path)

Downloads last month: 4

Safetensors

Model size

3B params

Tensor type

I32

BF16

Paper for radi-cho/gemma-2-2b-AWQ

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Paper • 2306.00978 • Published Jun 1, 2023 • 12