continuum-ai
/

qwen2.5-coder-14b-compacted

Text Generation

continuum:compacted

continuum:head-pruning

Model card Files Files and versions

EnricoFermi commited on 8 days ago

Commit

7afe2c9

·

verified ·

1 Parent(s): 1484529

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -146,6 +146,10 @@ Cycle 3: train (batch=3, 22B, 14.5GB)  -> prune -> defrag                  (2.8x
 40% faster total training and a 33% smaller final model.
 **Read the full paper**: [Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md)
 ## Output Samples

 40% faster total training and a 33% smaller final model.
+### Head Mitosis
+Pruning frees slots. Mitosis fills them. When a head is overutilized, it gets cloned into a pruned slot — each copy at 50% gate value to maintain output continuity. After continued training, the clones **diverge and specialize**, like cell differentiation after biological mitosis. The model grows new specialized capacity exactly where it's needed.
 **Read the full paper**: [Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md)
 ## Output Samples