continuum-ai
/

qwen3.5-27b-code-forged

EnricoFermi commited on 21 days ago

Commit

91c002b

verified ·

1 Parent(s): f564207

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -138,6 +138,10 @@ Cycle 3: train (batch=3, 22B, 14.5GB)  -> prune -> defrag                  (2.8x
 40% faster total training and a 33% smaller final model.
 **Read the full paper**: [Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md)
 ## Output Samples

 40% faster total training and a 33% smaller final model.
+### Head Mitosis
+Pruning frees slots. Mitosis fills them. When a head is overutilized, it gets cloned into a pruned slot — each copy at 50% gate value to maintain output continuity. After continued training, the clones **diverge and specialize**, like cell differentiation after biological mitosis. The model grows new specialized capacity exactly where it's needed.
 **Read the full paper**: [Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md)
 ## Output Samples