Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -146,6 +146,10 @@ Cycle 3: train (batch=3, 22B, 14.5GB) -> prune -> defrag (2.8x
|
|
| 146 |
|
| 147 |
40% faster total training and a 33% smaller final model.
|
| 148 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
**Read the full paper**: [Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md)
|
| 150 |
|
| 151 |
## Output Samples
|
|
|
|
| 146 |
|
| 147 |
40% faster total training and a 33% smaller final model.
|
| 148 |
|
| 149 |
+
### Head Mitosis
|
| 150 |
+
|
| 151 |
+
Pruning frees slots. Mitosis fills them. When a head is overutilized, it gets cloned into a pruned slot — each copy at 50% gate value to maintain output continuity. After continued training, the clones **diverge and specialize**, like cell differentiation after biological mitosis. The model grows new specialized capacity exactly where it's needed.
|
| 152 |
+
|
| 153 |
**Read the full paper**: [Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md)
|
| 154 |
|
| 155 |
## Output Samples
|