AWQ quantization
#1
by
dr-e
- opened
Can anybody run llm-compressor on this model to get a 4-bit quantized version to run on vLLM on memory-restricted hardware?
Thanks in advance.
from llmcompressor import oneshot
from llmcompressor.modifiers.awq import AWQModifier
from datasets import load_dataset
import os
calibration_set = load_dataset("nvidia/Llama-Nemotron-Post-Training-Dataset", split="train")
recipe = [
AWQModifier(
ignore=["lm_head"],
config_groups={
"group_0": {
"targets": ["Linear"],
"weights": {
"num_bits": 4,
"type": "int",
"symmetric": False,
"strategy": "group",
"group_size": 32,
}
}
}
),
]
oneshot(
model="Qwen/Qwen3-Coder-Next",
dataset=calibration_set,
recipe=recipe,
output_dir="~/Qwen3-Coder-Next-AWQ-4bit",
max_seq_length=2048,
num_calibration_samples=1024,
)
BitsAndBytes, but unsloth has a 4-bit quant already https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Instruct-bnb-4bit