EuroLLM-22B-Instruct AWQ (W4A16)

This is a W4A16 AWQ quantized version of utter-project/EuroLLM-22B-Instruct-2512.

Model Details

Attribute Value
Original Model EuroLLM-22B-Instruct-2512
Quantization W4A16_ASYM (4-bit weights, 16-bit activations)
Calibration Samples 256
Sequence Length 512
Calibration Dataset HuggingFaceH4/ultrachat_200k

Quantization Script

Dependencies

dependencies = [
    "llmcompressor>=0.10.0.1",
    "protobuf>=7.34.0",
    "sentencepiece>=0.2.1",
    "compressed-tensors>=0.12.2"
]

Code

from transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor.modifiers.awq import AWQModifier, AWQMapping
from llmcompressor import oneshot

from datasets import load_dataset

# envs
MODEL_PATH = "EuroLLM-22B-Instruct-2512"
OUTPUT_PATH = "EuroLLM-22B-Instruct-2512-awq"
NUM_CALIBRATION_SAMPLES = 256
MAX_SEQUENCE_LENGTH = 512

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    device_map="auto",
    dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)

# Load and preprocess calibration dataset
calib_dataset = load_dataset("HuggingFaceH4/ultrachat_200k", split=f"train_sft[:{NUM_CALIBRATION_SAMPLES}]")
calib_dataset = calib_dataset.shuffle(seed=42)

def preprocess(example):
    return {
        "text": tokenizer.apply_chat_template(
            example["messages"],
            tokenize=False,
        )
    }

calib_dataset = calib_dataset.map(preprocess)

# Tokenize calibration dataset
def tokenize(example):
    return tokenizer(
        example["text"],
        padding=False,
        max_length=MAX_SEQUENCE_LENGTH,
        truncation=True,
        add_special_tokens=False,
    )

calib_dataset = calib_dataset.map(tokenize, remove_columns=calib_dataset.column_names)

# Define mappings for llama
mappings_llama = [
    AWQMapping(
        "re:.*input_layernorm",
        ["re:.*q_proj", "re:.*k_proj", "re:.*v_proj"],
    ),
    AWQMapping("re:.*v_proj", ["re:.*o_proj"]),
    AWQMapping(
        "re:.*post_attention_layernorm",
        ["re:.*gate_proj", "re:.*up_proj"],
    ),
    AWQMapping(
        "re:.*up_proj",
        ["re:.*down_proj"],
    ),
]

# Define AWQ quantization recipe
recipe = AWQModifier(
    targets="Linear",
    scheme="W4A16_ASYM",
    ignore=["lm_head"],
    mappings=mappings_llama
)

# Run quantization with calibration
oneshot(
    model=model,
    tokenizer=tokenizer,
    dataset=calib_dataset,
    recipe=recipe,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
    output_dir=OUTPUT_PATH,
)

Languages Supported

EuroLLM supports all 24 official EU languages plus additional European languages: Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Turkish, Ukrainian

Chat Template

This model uses the ChatML format:

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_message}<|im_end|>

Credits

License

Apache 2.0 (same as base model)

Downloads last month
73
Safetensors
Model size
4B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for CarlosRCDev/EuroLLM-22B-Instruct-2512-awq