Qwen3.6-27B-int4-AutoRound

This is an Int4 AutoRound quantization of Qwen/Qwen3.6-27B, produced using spark-auto-round.

Quantization Details

Parameter Value
Original Model Qwen/Qwen3.6-27B
Quantization Method AutoRound (W4A16, symmetric)
Bits 4
Group Size 128
Calibration Dataset opencode-instruct
Calibration Samples 512
Calibration Sequence Length 2048
Tuning Iterations 1000
Batch Size 8
Packing Format auto_round:auto_gptq
AutoRound Version 0.14.2
Model Size ~19 GB

Layers Kept in FP16

The linear_attn.in_proj_a and linear_attn.in_proj_b projections across all DeltaNet layers, as well as mtp.fc, are kept at FP16 precision for quality preservation.

Quantization Report

All 64 transformer blocks passed sensitivity analysis (63 PASS, 1 WARN at layer 58).

Layer Range Cosine Similarity PSNR (dB)
Layers 0-10 0.9999 - 1.0000 80.7 - 84.0
Layers 11-20 0.9995 - 0.9999 74.9 - 81.5
Layers 21-30 0.9988 - 0.9995 73.6 - 78.7
Layers 31-40 0.9976 - 0.9986 69.4 - 73.2
Layers 41-50 0.9943 - 0.9976 60.2 - 69.2
Layers 51-63 0.9883 - 0.9934 53.4 - 66.5

Full per-layer reports are available in the repository: quantization-report.txt and quantization-report.csv.

How to Use

With vLLM

vllm serve coder3101/Qwen3.6-27B-int4-AutoRound

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "coder3101/Qwen3.6-27B-int4-AutoRound"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = "Explain the theory of relativity in simple terms."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Acknowledgments


Original Model Card -- Qwen3.6-27B

Below is the original model card from Qwen/Qwen3.6-27B.

Qwen3.6-27B

Highlights

Qwen3.6-27B follows the Qwen3.5 series with key upgrades:

  • Agentic Coding: Handles frontend workflows and repo-level reasoning with greater fluency.
  • Thinking Preservation: New option to retain reasoning context from historical messages, reducing overhead in iterative development.

Model Architecture

Property Value
Type Causal Language Model with Vision Encoder
Parameters 27B
Hidden Dimension 5120
Token Embedding 248320 (Padded)
Number of Layers 64
Hidden Layout 16 x (3 x (Gated DeltaNet -> FFN) -> 1 x (Gated Attention -> FFN))
FFN Intermediate Dimension 17408
Context Length 262,144 (natively), up to 1,010,000 with YaRN

Gated DeltaNet: 48 linear attention heads for V, 16 for QK (head dim: 128) Gated Attention: 24 heads for Q, 4 for KV (head dim: 256, RoPE dim: 64)

Benchmark Results -- Language

Benchmark Qwen3.5-27B Qwen3.5-397B-A17B Gemma4-31B Claude 4.5 Opus Qwen3.6-35B-A3B Qwen3.6-27B
SWE-bench Verified 75.0 76.2 52.0 80.9 73.4 77.2
SWE-bench Pro 51.2 50.9 35.7 57.1 49.5 53.5
SWE-bench Multilingual 69.3 69.3 51.7 77.5 67.2 71.3
Terminal-Bench 2.0 41.6 52.5 42.9 59.3 51.5 59.3
SkillsBench Avg5 27.2 30.0 23.6 45.3 28.7 48.2
QwenWebBench 1068 1186 1197 1536 1397 1487
NL2Repo 27.3 32.2 15.5 43.2 29.4 36.2
Claw-Eval Avg 64.3 70.7 48.5 76.6 68.7 72.4
Claw-Eval Pass^3 46.2 48.1 25.0 59.6 50.0 60.6
QwenClawBench 52.2 51.8 41.7 52.3 52.6 53.4
MMLU-Pro 86.1 87.8 85.2 89.5 85.2 86.2
MMLU-Redux 93.2 94.9 93.7 95.6 93.3 93.5
SuperGPQA 65.6 70.4 65.7 70.6 64.7 66.0
C-Eval 90.5 93.0 82.6 92.2 90.0 91.4
GPQA Diamond 85.5 88.4 84.3 87.0 86.0 87.8
HLE 24.3 28.7 19.5 30.8 21.4 24.0
LiveCodeBench v6 80.7 83.6 80.0 84.8 80.4 83.9
HMMT Feb 25 92.0 94.8 88.7 92.9 90.7 93.8
HMMT Nov 25 89.8 92.7 87.5 93.3 89.1 90.7
HMMT Feb 26 84.3 87.9 77.2 85.3 83.6 84.3
IMOAnswerBench 79.9 80.9 74.5 84.0 78.9 80.8
AIME26 92.6 93.3 89.2 95.1 92.7 94.1

Benchmark Results -- Vision Language

Benchmark Qwen3.5-27B Qwen3.5-397B-A17B Gemma4-31B Claude 4.5 Opus Qwen3.6-35B-A3B Qwen3.6-27B
MMMU 82.3 85.0 80.4 80.7 81.7 82.9
MMMU-Pro 75.0 79.0 76.9 70.6 75.3 75.8
MathVista mini 87.8 -- 79.3 -- 86.4 87.4
DynaMath 87.7 86.3 79.5 79.7 82.8 85.6
VlmsAreBlind 96.9 -- 87.2 -- 96.6 97.0
RealWorldQA 83.7 83.9 72.3 77.0 85.3 84.1
MMStar 81.0 83.8 77.3 73.2 80.7 81.4
MMBenchEN-DEV-v1.1 92.6 -- 90.9 -- 92.8 92.3
SimpleVQA 56.0 67.1 52.9 65.7 58.9 56.1
CharXiv RQ 79.5 80.8 67.9 68.5 78.0 78.4
CC-OCR 81.0 82.0 75.7 76.9 81.9 81.2
OCRBench 89.4 -- 86.1 -- 90.0 89.4
ERQA 60.5 67.5 57.5 46.8 61.8 62.5
CountBench 97.8 97.2 96.1 90.6 96.1 97.8
RefCOCO avg 90.9 92.3 -- -- 92.0 92.5
EmbSpatialBench 84.5 -- -- -- 84.3 84.6
RefSpatialBench 67.7 -- 4.7 -- 64.3 70.0
VideoMME (w sub.) 87.0 87.5 -- 77.7 86.6 87.7
VideoMMMU 82.3 84.7 81.6 84.4 83.7 84.4
MLVU 85.9 86.7 -- 81.7 86.2 86.6
MVBench 74.6 77.6 -- 67.2 74.6 75.5
V* 93.7 95.8 -- 67.0 90.1 94.7
AndroidWorld 64.2 -- -- -- -- 70.3

Serving Frameworks

  • SGLang (>=0.5.10)
  • vLLM (>=0.19.0)
  • KTransformers
  • HuggingFace Transformers

Sampling Parameters

Mode Temperature top_p top_k min_p presence_penalty
Thinking (general) 1.0 0.95 20 0.0 0.0
Thinking (precise coding/WebDev) 0.6 0.95 20 -- --
Non-thinking / Instruct 0.7 0.80 20 -- 1.5

Key Features

  • Thinking mode is on by default; can be disabled via enable_thinking: False.
  • Does not support soft switch (/think and /nothink from Qwen3).
  • Preserve Thinking: preserve_thinking: True retains reasoning traces from history.
  • Supports text, image, and video inputs.
  • Multi-Token Prediction (MTP) supported.
  • Native context length: 262,144 tokens; extensible to 1,010,000 tokens with YaRN RoPE scaling.

Citation

@misc{qwen3.6-27b,
    title  = {{Qwen3.6-27B}: Flagship-Level Coding in a {27B} Dense Model},
    author = {{Qwen Team}},
    month  = {April},
    year   = {2026},
    url    = {https://qwen.ai/blog?id=qwen3.6-27b}
}
Downloads last month
216
Safetensors
Model size
3B params
Tensor type
BF16
·
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for coder3101/Qwen3.6-27B-int4-AutoRound

Base model

Qwen/Qwen3.6-27B
Quantized
(516)
this model

Collection including coder3101/Qwen3.6-27B-int4-AutoRound