Qwen3-21B Pruned from 30B (90 Experts)

A pruned version of Qwen3-30B-A3B-Instruct with 38 experts removed through expert pruning, reducing the model from 30B to approximately 21B parameters while achieving significant memory savings.

Model Details

Base Model: Qwen/Qwen3-30B-A3B-Instruct-2507-FP8
Architecture: Mixture of Experts (MoE) Transformer
Original Parameters: ~30B
Pruned Parameters: ~21B
Original Experts: 128 per layer
Pruned Experts: 90 per layer (38 removed)
Size Reduction: 28.2% parameter reduction
Quality Impact: +7.36% performance change

Pruning Methodology

Expert Usage Analysis

Used real-time router logit analysis to identify the least utilized experts across the model:

Analyzed expert routing patterns with output_router_logits=True
Tracked expert selection frequency across multiple inference samples
Identified 38 least-used experts for removal based on actual usage statistics

True Architectural Pruning

Unlike weight masking approaches, this model features genuine architectural changes:

In-place expert removal: deleted unused expert modules
Router adjustment: Reduced router dimensions from 128→90 outputs
Weight remapping: Preserved routing weights for remaining experts
Config updates: Model configuration reflects new expert count

Quality Impact

Performance Impact: 7.36% degradation on evaluation metrics
Note: Performance may vary across different task types
Efficiency Gains: Faster inference due to reduced expert overhead

Technical Specifications

Architecture:
  - Layers: 48
  - Hidden Size: 2048
  - Attention Heads: 32
  - Experts per Layer: 90 (reduced from 128)
  - Active Experts per Token: 8
  - Context Length: 128K
  - Effective Parameters: ~21B (reduced from ~30B)

Optimizations:
  - FP8 quantization preserved
  - SafeTensors format
  - Flash Attention compatible
  - Efficient expert routing
  - True architectural pruning


| Metric | Original | Pruned | Change |
|--------|----------|--------|--------|
| Total Parameters | ~30B | ~21B | -28.2% |
| Model Size | 56.9 GB | 40.8 GB | -16.0 GB |
| Experts per Layer | 128 | 90 | -38 |
| Evaluation Loss | Baseline | +7.36% | Probable degradation |

@misc{qwen3-pruned-90, title={Qwen3-21B Pruned Architecture with 90 Experts}, author={ Expert Pruning}, year={2025}, note={Pruned version of Qwen3-30B-A3B-Instruct with 38 experts removed} }

Downloads last month: 5

Safetensors

Model size

22B params

Tensor type

F32

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support