Qwen3.5-Creative-26B-A3B
A creative-writing-optimized pruning of Qwen/Qwen3.5-35B-A3B using REAP.
25% of MoE experts pruned (256 โ 192) using a creative writing calibration dataset. A lighter prune that preserves more reasoning capability while still significantly reducing model size.
What is this?
| Base Model | This Model | 50% Prune | |
|---|---|---|---|
| Total params | ~35B | ~26B | ~18B |
| Active params/token | ~3B | ~3B | ~3B |
| MoE experts | 256 | 192 | 128 |
| Q4_K_M GGUF | ~21GB | ~15GB | ~10GB |
| Target VRAM | 24GB+ | 24GB | 16-24GB |
How it was made
- Calibration dataset: 3000 samples โ 1000 each from WritingPrompts, Project Gutenberg, and Roleplay scenarios (Timersofc/creative-writing-reap-calibration)
- REAP profiling: Router-weighted expert activation norms recorded across all 40 MoE layers
- Pruning: Bottom 25% of experts by REAP score removed globally
Why this over the 50% prune?
The 50% version (Timersofc/Qwen3.5-Creative-18B-A3B) is smaller and faster but chain-of-thought reasoning is less stable. This 25% version retains more of the original model's reasoning experts, making it better for:
- Long-form storytelling requiring plot consistency
- Complex character work needing CoT planning
- Any task where you want reliable thinking mode
If you just need short-form creative output and want maximum compression, the 50% version is better value.
Usage notes
- Works with standard Qwen3.5 chat templates
- For thinking mode: model should produce
<think>...</think>blocks naturally - Prefilling with
<think>\nOkay,can help ensure CoT engagement
GGUF quantizations
Available in Timersofc/Qwen3.5-Creative-26B-A3B-GGUF:
Q4_K_M(imatrix) โ ~15GB, recommended for 24GB VRAMQ6_K(imatrix) โ ~19GB, higher qualityf16โ full precision GGUF for custom quantization
All quantizations use an importance matrix generated from the same creative writing calibration dataset used for REAP profiling. This means bit allocation within each tensor is optimized for creative writing โ weights that matter most for prose quality get higher precision.
Credits
- Qwen team for the base model
- Cerebras Research for the REAP method
- REAP fork with Qwen3.5 patches: janmts/reap
License
Same as the base model. This is an unofficial community variant, not affiliated with Alibaba or Cerebras.
- Downloads last month
- 13