Qwen3.5-Creative-26B-A3B

A creative-writing-optimized pruning of Qwen/Qwen3.5-35B-A3B using REAP.

25% of MoE experts pruned (256 → 192) using a creative writing calibration dataset. A lighter prune that preserves more reasoning capability while still significantly reducing model size.

What is this?

	Base Model	This Model	50% Prune
Total params	~35B	~26B	~18B
Active params/token	~3B	~3B	~3B
MoE experts	256	192	128
Q4_K_M GGUF	~21GB	~15GB	~10GB
Target VRAM	24GB+	24GB	16-24GB

How it was made

Calibration dataset: 3000 samples — 1000 each from WritingPrompts, Project Gutenberg, and Roleplay scenarios (Timersofc/creative-writing-reap-calibration)
REAP profiling: Router-weighted expert activation norms recorded across all 40 MoE layers
Pruning: Bottom 25% of experts by REAP score removed globally

Why this over the 50% prune?

The 50% version (Timersofc/Qwen3.5-Creative-18B-A3B) is smaller and faster but chain-of-thought reasoning is less stable. This 25% version retains more of the original model's reasoning experts, making it better for:

Long-form storytelling requiring plot consistency
Complex character work needing CoT planning
Any task where you want reliable thinking mode

If you just need short-form creative output and want maximum compression, the 50% version is better value.

Usage notes

Works with standard Qwen3.5 chat templates
For thinking mode: model should produce <think>...</think> blocks naturally
Prefilling with <think>\nOkay, can help ensure CoT engagement

GGUF quantizations

Available in Timersofc/Qwen3.5-Creative-26B-A3B-GGUF:

Q4_K_M (imatrix) — ~15GB, recommended for 24GB VRAM
Q6_K (imatrix) — ~19GB, higher quality
f16 — full precision GGUF for custom quantization

All quantizations use an importance matrix generated from the same creative writing calibration dataset used for REAP profiling. This means bit allocation within each tensor is optimized for creative writing — weights that matter most for prose quality get higher precision.

Credits

Qwen team for the base model
Cerebras Research for the REAP method
REAP fork with Qwen3.5 patches: janmts/reap

License

Same as the base model. This is an unofficial community variant, not affiliated with Alibaba or Cerebras.

Downloads last month: 9

Safetensors

Model size

27B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Timersofc/Qwen3.5-Creative-26B-A3B-REAP

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Finetuned

(131)

this model

Quantizations

3 models