Qwen3.5-9B-GPTQ-INT4
This model is a GPTQ-quantized version of Qwen/Qwen3.5-9B.
Quantization
- Method: GPTQ
- Bits: 4
- Group size: 128
- desc_act: False
- damp_percent: 0.1
- Calibration preset: math_qa_cot
- Calibration dataset:
zwhe99/DeepMath-103Ksplittrain - Max calibration samples: 128
- Max sequence length: 16384
Intended Use
This checkpoint was created to measure whether quantization degrades math reasoning quality, especially chain-of-thought stability.
Reproduction
This model was made by scripts in https://github.com/mssfj/lowbit-math-reasoning.git, execute command is following.
uv run python scripts/quantize_qwen35_9b_gptq.py \
--model-name Qwen/Qwen3.5-9B \
--output-dir /workspace/lowbit-math-reasoning/model/Qwen3.5-9B-GPTQ-INT4 \
--dataset-name zwhe99/DeepMath-103K \
--dataset-config '' \
--dataset-split train \
--calibration-preset math_qa_cot \
--question-column question \
--answer-column r1_solution_1 \
--text-column r1_solution_1 \
--max-calibration-samples 128 \
--max-seq-len 16384 \
--bits 4 \
--group-size 128 \
--damp-percent 0.1
Loading
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "mssfj/Qwen3.5-9B-GPTQ-INT4"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="auto",
trust_remote_code=True,
)
Notes
- This repository contains quantized weights only.
- Use a recent
transformersbuild that supports Qwen3.5 GPTQ checkpoints. - Evaluation should be performed on math benchmarks such as GSM8K or MATH-500 to check answer accuracy and CoT-format failures.
- Long math CoT calibration samples are often truncated heavily below
--max-seq-len 8192. - Stop vLLM or other GPU-heavy processes before quantization to avoid OOM during GPTQ.
- Downloads last month
- 4,834