YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Qwen3-21B Pruned from 30B (90 Experts)
A pruned version of Qwen3-30B-A3B-Instruct with 38 experts removed through expert pruning, reducing the model from 30B to approximately 21B parameters while achieving significant memory savings.
Model Details
- Base Model: Qwen/Qwen3-30B-A3B-Instruct-2507-FP8
- Architecture: Mixture of Experts (MoE) Transformer
- Original Parameters: ~30B
- Pruned Parameters: ~21B
- Original Experts: 128 per layer
- Pruned Experts: 90 per layer (38 removed)
- Size Reduction: 28.2% parameter reduction
- Quality Impact: +7.36% performance change
Pruning Methodology
Expert Usage Analysis
Used real-time router logit analysis to identify the least utilized experts across the model:
- Analyzed expert routing patterns with
output_router_logits=True - Tracked expert selection frequency across multiple inference samples
- Identified 38 least-used experts for removal based on actual usage statistics
True Architectural Pruning
Unlike weight masking approaches, this model features genuine architectural changes:
- In-place expert removal: deleted unused expert modules
- Router adjustment: Reduced router dimensions from 128→90 outputs
- Weight remapping: Preserved routing weights for remaining experts
- Config updates: Model configuration reflects new expert count
Quality Impact
- Performance Impact: 7.36% degradation on evaluation metrics
- Note: Performance may vary across different task types
- Efficiency Gains: Faster inference due to reduced expert overhead
Technical Specifications
Architecture:
- Layers: 48
- Hidden Size: 2048
- Attention Heads: 32
- Experts per Layer: 90 (reduced from 128)
- Active Experts per Token: 8
- Context Length: 128K
- Effective Parameters: ~21B (reduced from ~30B)
Optimizations:
- FP8 quantization preserved
- SafeTensors format
- Flash Attention compatible
- Efficient expert routing
- True architectural pruning
| Metric | Original | Pruned | Change |
|--------|----------|--------|--------|
| Total Parameters | ~30B | ~21B | -28.2% |
| Model Size | 56.9 GB | 40.8 GB | -16.0 GB |
| Experts per Layer | 128 | 90 | -38 |
| Evaluation Loss | Baseline | +7.36% | Probable degradation |
@misc{qwen3-pruned-90, title={Qwen3-21B Pruned Architecture with 90 Experts}, author={ Expert Pruning}, year={2025}, note={Pruned version of Qwen3-30B-A3B-Instruct with 38 experts removed} }
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support