Qwen3-Coder-Next-REAM

This model is a compressed version of Qwen/Qwen3-Coder-Next. It is obtained by reducing the number of experts in each MoE layer from 512 to 384. This reduction is achieved by the REAM method described in https://bknyaz.github.io/blog/2026/moe/.

Compared to other models obtained in this collection, more code data is used in the calibration data during pruning/merging to better preserve original's model coding abilities. Specifically, the ratio between c4, math and coding data (see https://bknyaz.github.io/blog/2026/moe/) is 0.0, 0.7, 0.3. The calibration data used here is the same as in our Qwen3-Coder-Next-REAP. Compared to other REAM models, here we used C=32 (number of experts in groups) instead of C=16, which we found to work better.

The compressed model has 60B params (120GB) instead of 80B (160GB) of the original model, reducing storage and GPU memory requirements by roughly 25%. At the same time, the model retains 100% (or very close) of the original model's performance on a variety of benchmarks (see Results section below). Additional efficiency optimization (e.g., quantization) can be added similarly to the original model.

See additional details at Qwen3-30B-A3B-Instruct-2507-REAM.

Results

Model IFeval AIME25 GSM8K GPQA-D HumanEval LiveCodeBench AVG
Qwen3-Coder-Next 89.6 80.0 85.4 42.4 92.7 47.5 72.9
Qwen3-Coder-Next-REAM 89.3 80.0 85.3 40.4 94.5 48.0 72.9

License

Please refer to the license of the original model Qwen/Qwen3-Coder-Next.

Downloads last month
1,672
Safetensors
Model size
11B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for cyankiwi/Qwen3-Coder-Next-REAM-AWQ-4bit

Quantized
(6)
this model
Quantizations
1 model