maxie-12321
/

phi-mini-moe-pruned-specialized-v1

+---
+license: apache-2.0
+base_model: mahiatlinux/Phi-mini-MoE
+tags:
+- moe
+- mixture-of-attention
+- pruned
+- specialized
+- self-training
+---
+# Phi-mini-MoE + MoA + Pruning + Specialization
+## What's Special
+This model adds **Mixture of Attention (MoA)** routing to Phi-mini-MoE, then:
+- ✂️ **Pruned 25% of attention heads** (kept only important ones)
+- 🎯 **Forced expert specialization** (each expert focuses on specific tasks)
+- ⚡ **~3x faster** than OLMoE-1B-7B
+## Stats
+- Base: Phi-mini-MoE (7.6B total, 2.4B active)
+- Attention heads: 32 → 24 (pruned 25%)
+- Training iterations: 10
+- Expert specialization: 16.7%
+## Files
+- `moa_router.pt` - Trained + pruned MoA router
+- `training_data.json` - Self-play examples
+- `expert_stats.json` - Expert specialization profiles
+- `pruning_stats.json` - Which heads were pruned
+## By
+[maxie-12321](https://huggingface.co/maxie-12321)