metadata
tags:
- research
- pruning
- experiential-plasticity
- neural-architecture
Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience
Joel Teply — continuum-ai, Kansas City
Abstract
Iterative entropy-based pruning with domain-specific retraining produces transformers that are both smaller and more capable. Qwen3.5-4B achieves +24% improvement on code when forged with CodeFeedback data. Qwen3.5-27B achieves +3.5% improvement while compressing from 54GB to 15GB — running on a MacBook at 9 tok/s with Sonnet 4.6-level intelligence.
We discover a measurable transfer function (1.45·exp(−0.18·cycle) − 0.03) connecting transformer optimization to classical control theory, and introduce continuous defrag — structural head removal that accelerates training by 40%.
Paper
Published Models
| Model | Improvement | Size | Hardware |
|---|---|---|---|
| qwen3.5-27b-code-forged-mlx-4bit | +3.5% | 15GB | MacBook |
| qwen3.5-27b-code-forged | +3.5% | 30GB | CUDA |
| qwen3.5-4b-code-forged | +24.0% | 8GB | Any |
Code
- sentinel-ai — the forging framework
- continuum — distributed AI on consumer hardware
Key Findings
- Domain-specific training amplifies plasticity — code data produces 24% improvement vs 14.6% on generic text
- The transfer function is predictable — recovery follows an exponential decay, enabling self-directed pruning
- Continuous defrag accelerates training — structurally removing dead heads frees VRAM for larger batches
- 72% compression with quality improvement — 54GB → 15GB while being better at code