maxie-12321 commited on
Commit
ddea8b6
·
verified ·
1 Parent(s): b8ab5f9

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mahiatlinux/Phi-mini-MoE
4
+ tags:
5
+ - moe
6
+ - mixture-of-attention
7
+ - pruned
8
+ - specialized
9
+ - self-training
10
+ ---
11
+
12
+ # Phi-mini-MoE + MoA + Pruning + Specialization
13
+
14
+ ## What's Special
15
+
16
+ This model adds **Mixture of Attention (MoA)** routing to Phi-mini-MoE, then:
17
+ - ✂️ **Pruned 25% of attention heads** (kept only important ones)
18
+ - 🎯 **Forced expert specialization** (each expert focuses on specific tasks)
19
+ - ⚡ **~3x faster** than OLMoE-1B-7B
20
+
21
+ ## Stats
22
+
23
+ - Base: Phi-mini-MoE (7.6B total, 2.4B active)
24
+ - Attention heads: 32 → 24 (pruned 25%)
25
+ - Training iterations: 10
26
+ - Expert specialization: 16.7%
27
+
28
+ ## Files
29
+
30
+ - `moa_router.pt` - Trained + pruned MoA router
31
+ - `training_data.json` - Self-play examples
32
+ - `expert_stats.json` - Expert specialization profiles
33
+ - `pruning_stats.json` - Which heads were pruned
34
+
35
+ ## By
36
+
37
+ [maxie-12321](https://huggingface.co/maxie-12321)