Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience
Joel Teply โ continuum-ai, Kansas City
Abstract
Iterative entropy-based pruning with domain-specific retraining produces transformers that are both smaller and more capable. Qwen3.5-4B achieves +24% improvement on code when forged with CodeFeedback data. Qwen3.5-27B achieves +3.5% improvement while compressing from 54GB to 15GB โ running on a MacBook at 9 tok/s with Sonnet 4.6-level intelligence.
We discover a measurable transfer function (1.45ยทexp(โ0.18ยทcycle) โ 0.03) connecting transformer optimization to classical control theory, and introduce continuous defrag โ structural head removal that accelerates training by 40%.
Paper
๐ PDF ๐ Full paper on GitHub
Published Models
| Model | Improvement | Size | Hardware |
|---|---|---|---|
| qwen3.5-27b-code-forged-mlx-4bit | +3.5% | 15GB | MacBook |
| qwen3.5-27b-code-forged | +3.5% | 30GB | CUDA |
| qwen3.5-4b-code-forged | +24.0% | 8GB | Any |
Code
- sentinel-ai โ the forging framework
- continuum โ distributed AI on consumer hardware
Key Findings
- Domain-specific training amplifies plasticity โ code data produces 24% improvement vs 14.6% on generic text
- The transfer function is predictable โ recovery follows an exponential decay, enabling self-directed pruning
- Continuous defrag accelerates training โ structurally removing dead heads frees VRAM for larger batches
- 72% compression with quality improvement โ 54GB โ 15GB while being better at code
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support