Text Generation
MLX
Safetensors
Rust
qwen3_5
27b
agentic-coding
alloy-backfilled
android
apple-silicon
attested
bash
c
chain-of-custody
chinese
code
code-completion
code-generation
code-infill
coder
coding
compacted
consumer-gpu
cpp
cryptographically-verified
css
edge-inference
efficient
embedded
english
forge-alloy
function-calling
go
head-pruning
html
iphone
java
javascript
kotlin
llama-cpp
lm-studio
local-inference
macbook
mobile
multilingual
ollama
on-device
optimized
php
programming
pruned
python
qwen
qwen3
qwen3.5
raspberry-pi
reproducible
ruby
software-engineering
sql
swift
typescript
conversational
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -138,6 +138,10 @@ Cycle 3: train (batch=3, 22B, 14.5GB) -> prune -> defrag (2.8x
|
|
| 138 |
|
| 139 |
40% faster total training and a 33% smaller final model.
|
| 140 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
**Read the full paper**: [Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md)
|
| 142 |
|
| 143 |
## Output Samples
|
|
|
|
| 138 |
|
| 139 |
40% faster total training and a 33% smaller final model.
|
| 140 |
|
| 141 |
+
### Head Mitosis
|
| 142 |
+
|
| 143 |
+
Pruning frees slots. Mitosis fills them. When a head is overutilized, it gets cloned into a pruned slot — each copy at 50% gate value to maintain output continuity. After continued training, the clones **diverge and specialize**, like cell differentiation after biological mitosis. The model grows new specialized capacity exactly where it's needed.
|
| 144 |
+
|
| 145 |
**Read the full paper**: [Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md)
|
| 146 |
|
| 147 |
## Output Samples
|