Qwen3.6-35B-A3B for hipfire
Pre-quantized Qwen3.6-35B-A3B (MoE, 35B total / 3B activated) for
hipfire, a Rust-native LLM
inference engine for AMD RDNA GPUs.
Quantized from Qwen/Qwen3.6-35B-A3B.
Qwen3.6's April 2026 refresh of the A3B line, with a coding/agentic
fine-tune recipe. Architecture is unchanged from Qwen3.5-35B-A3B โ
256 experts top-8, hybrid DeltaNet + Full Attention (3:1 ratio), head_dim=256
with partial_rotary_factor=0.25, shared expert, tied embeddings โ so
hipfire's arch_id=6 path loads it without any engine changes.
โ ๏ธ 2026-05-07 release โ Q8 router fix
This release replaces the prior .mq4 with a re-quantized version that
fixes issue #171 โ a
structural attractor on agentic prompts when the MoE router was at 4-bit.
The contributor @fivetide's
PR #180 promotes
mlp.gate.weight and mlp.shared_expert_gate.weight to Q8F16, costing
~10 MB additional model size. Empirical recovery on the 3.6-A3B
code-review reproducer:
| variant |
unique-word ratio |
verdict |
| MQ4 4-bit router (pre-fix, deprecated) |
14% |
ATTRACTOR |
| MQ4 + Q8 router (this release) |
46% |
CLEAN |
| HFQ6 reference |
70% |
CLEAN |
The 3.6-A3B family was the model class most exposed to the cliff (see
issue #171 and the
investigation log at
docs/investigations/2026-05-06-moe-quant-cliff-survey);
3.5-A3B was less visibly affected but is also re-quantized for parity.
If you previously downloaded qwen3.6-35b-a3b.mq4, re-pull it to pick
up the fix. The .hermes.triattn.bin sidecar from the prior release is
calibrated against the broken-router weights and is currently
deprecated โ re-calibration on the new .mq4 is in flight.
Files
| File |
Quant |
Size |
Min VRAM |
RX 7900 XTX decode |
Status |
| qwen3.6-35b-a3b.mq4 โญ |
MQ4 + Q8 router |
19 GB |
22 GB |
~148 tok/s |
2026-05-07 fixed release |
| qwen3.6-35b-a3b.mq3 |
MQ3 + Q8 router |
19 GB |
22 GB |
TBD |
Smaller-bit variant for memory-constrained hosts |
โญ MQ4 is FWHT-rotated 4-bit with the routing tensors (mlp.gate.weight,
mlp.shared_expert_gate.weight) pinned at Q8F16. Quality-gated against
the Q8 reference on the hipfire coherence battery.
Usage
curl -L https://raw.githubusercontent.com/Kaden-Schutt/hipfire/master/scripts/install.sh | bash
hipfire pull qwen3.6:35b-a3b
hipfire run qwen3.6:35b-a3b "Write a Rust function that parses an ISO-8601 date."
To pull the MQ3 variant explicitly:
hf download schuttdev/hipfire-qwen3.6-35b-a3b qwen3.6-35b-a3b.mq3 \
--local-dir ~/.hipfire/models
Configuration notes
Quantization format
- MQ4 (MagnumQuant-4) โ FWHT-rotated 4-bit with asym3 KV cache default.
Routing tensors at Q8F16. Matches Q8 output quality at ~Q4 bandwidth on
hipfire's WMMA/dot2 fused kernel paths.
- MQ3 (MagnumQuant-3) โ same FWHT-rotated approach at 3-bit for the
bulk weights, Q8F16 for routing/embed/lm_head. Useful when MQ4 doesn't
fit on the target host.
See docs/QUANTIZATION.md
for details on the rotation invariance property and the quality gate.
License
Apache 2.0, following the upstream Qwen/Qwen3.6-35B-A3B license.