AWQ quantization

#1
by dr-e - opened

Can anybody run llm-compressor on this model to get a 4-bit quantized version to run on vLLM on memory-restricted hardware?

Thanks in advance.

from llmcompressor import oneshot
from llmcompressor.modifiers.awq import AWQModifier
from datasets import load_dataset
import os
calibration_set = load_dataset("nvidia/Llama-Nemotron-Post-Training-Dataset", split="train")

recipe = [
    AWQModifier(
        ignore=["lm_head"],
        config_groups={
            "group_0": {
                "targets": ["Linear"],
                "weights": {
                    "num_bits": 4,
                    "type": "int",
                    "symmetric": False,
                    "strategy": "group",
                    "group_size": 32,
                }
            }
        }
    ),
]

oneshot(
    model="Qwen/Qwen3-Coder-Next",
    dataset=calibration_set,
    recipe=recipe,
    output_dir="~/Qwen3-Coder-Next-AWQ-4bit",
    max_seq_length=2048,
    num_calibration_samples=1024,
)

BitsAndBytes, but unsloth has a 4-bit quant already https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Instruct-bnb-4bit

Sign up or log in to comment