Qwen3.5-9B-GPTQ-INT4

This model is a GPTQ-quantized version of Qwen/Qwen3.5-9B.

Quantization

  • Method: GPTQ
  • Bits: 4
  • Group size: 128
  • desc_act: False
  • damp_percent: 0.1
  • Calibration preset: math_qa_cot
  • Calibration dataset: zwhe99/DeepMath-103K split train
  • Max calibration samples: 128
  • Max sequence length: 16384

Intended Use

This checkpoint was created to measure whether quantization degrades math reasoning quality, especially chain-of-thought stability.

Reproduction

This model was made by scripts in https://github.com/mssfj/lowbit-math-reasoning.git, execute command is following.

uv run python scripts/quantize_qwen35_9b_gptq.py \
  --model-name Qwen/Qwen3.5-9B \
  --output-dir /workspace/lowbit-math-reasoning/model/Qwen3.5-9B-GPTQ-INT4 \
  --dataset-name zwhe99/DeepMath-103K \
  --dataset-config '' \
  --dataset-split train \
  --calibration-preset math_qa_cot \
  --question-column question \
  --answer-column r1_solution_1 \
  --text-column r1_solution_1 \
  --max-calibration-samples 128 \
  --max-seq-len 16384 \
  --bits 4 \
  --group-size 128 \
  --damp-percent 0.1

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "mssfj/Qwen3.5-9B-GPTQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    trust_remote_code=True,
)

Notes

  • This repository contains quantized weights only.
  • Use a recent transformers build that supports Qwen3.5 GPTQ checkpoints.
  • Evaluation should be performed on math benchmarks such as GSM8K or MATH-500 to check answer accuracy and CoT-format failures.
  • Long math CoT calibration samples are often truncated heavily below --max-seq-len 8192.
  • Stop vLLM or other GPU-heavy processes before quantization to avoid OOM during GPTQ.
Downloads last month
4,834
Safetensors
Model size
9B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mssfj/Qwen3.5-9B-GPTQ-INT4

Finetuned
Qwen/Qwen3.5-9B
Quantized
(171)
this model