Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience

Joel Teply โ€” continuum-ai, Kansas City

Abstract

Iterative entropy-based pruning with domain-specific retraining produces transformers that are both smaller and more capable. Qwen3.5-4B achieves +24% improvement on code when forged with CodeFeedback data. Qwen3.5-27B achieves +3.5% improvement while compressing from 54GB to 15GB โ€” running on a MacBook at 9 tok/s with Sonnet 4.6-level intelligence.

We discover a measurable transfer function (1.45ยทexp(โˆ’0.18ยทcycle) โˆ’ 0.03) connecting transformer optimization to classical control theory, and introduce continuous defrag โ€” structural head removal that accelerates training by 40%.

Paper

๐Ÿ“„ PDF ๐Ÿ“ Full paper on GitHub

Published Models

Model Improvement Size Hardware
qwen3.5-27b-code-forged-mlx-4bit +3.5% 15GB MacBook
qwen3.5-27b-code-forged +3.5% 30GB CUDA
qwen3.5-4b-code-forged +24.0% 8GB Any

Code

Key Findings

  1. Domain-specific training amplifies plasticity โ€” code data produces 24% improvement vs 14.6% on generic text
  2. The transfer function is predictable โ€” recovery follows an exponential decay, enabling self-directed pruning
  3. Continuous defrag accelerates training โ€” structurally removing dead heads frees VRAM for larger batches
  4. 72% compression with quality improvement โ€” 54GB โ†’ 15GB while being better at code
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support