Chunity/Qwen3.6-35B-A3B-AutoRound-AWQ-4bit

AutoRound 4-bit AWQ quantization of Qwen/Qwen3.6-35B-A3B.

Quantization Summary

  • Base model: Qwen/Qwen3.6-35B-A3B
  • Quantization: AutoRound -> AWQ
  • Scheme: W4A16
  • Bits: 4
  • Group size: 128
  • Iterations: 500
  • Output format: auto_awq

What Was Quantized

This checkpoint keeps the multimodal stack intact and focuses AWQ quantization on the language-model blocks.

Quantized:

  • model.language_model.layers

Left unquantized where required for functional runtime compatibility:

  • lm_head
  • linear_attn.*
  • self_attn.* on the full-attention layers
  • mlp.shared_expert.*
  • mlp.shared_expert_gate
  • visual tower and merger modules
  • MTP tensors were preserved in model_extra_tensors.safetensors

Runtime Notes

This checkpoint was validated on a recent vLLM build that loads it through the awq_marlin path.

Environment used for validation:

export VLLM_USE_DEEP_GEMM=0
export VLLM_USE_FLASHINFER_MOE_FP16=1
export VLLM_USE_FLASHINFER_SAMPLER=0
export OMP_NUM_THREADS=4

Example:

from vllm import LLM

llm = LLM(
    model="Chunity/Qwen3.6-35B-A3B-AutoRound-AWQ-4bit",
    trust_remote_code=True,
    max_model_len=256,
    gpu_memory_utilization=0.95,
    max_num_seqs=1,
    language_model_only=True,
)

Validation

The checkpoint was loaded and exercised with vLLM 0.19.1.

Observed:

  • loads successfully as AWQ (awq_marlin)
  • coherent factual generation works
  • this model family may still emit reasoning-style <think> output depending on prompt formatting and runtime settings

Files

The repo includes:

  • AWQ weight shards
  • config.json
  • quantization_config.json
  • tokenizer and processor files
  • model_extra_tensors.safetensors for preserved non-exported tensors

Caveat

This is a mixed FP/AWQ export tailored to Qwen3.6's hybrid-attention MoE architecture. The quantization intentionally does not compress every submodule.

Downloads last month
13,373
Safetensors
Model size
7B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chunity/Qwen3.6-35B-A3B-AutoRound-AWQ-4bit

Quantized
(317)
this model