Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience

Joel Teply — continuum-ai, Kansas City

Abstract

Iterative entropy-based pruning with domain-specific retraining produces transformers that are both smaller and more capable. Qwen3.5-4B achieves +24% improvement on code when forged with CodeFeedback data. Qwen3.5-27B achieves +3.5% improvement while compressing from 54GB to 15GB — running on a MacBook at 9 tok/s with Sonnet 4.6-level intelligence.

We discover a measurable transfer function (1.45·exp(−0.18·cycle) − 0.03) connecting transformer optimization to classical control theory, and introduce continuous defrag — structural head removal that accelerates training by 40%.

Paper

📄 PDF 📝 Full paper on GitHub

Published Models

Model	Improvement	Size	Hardware
qwen3.5-27b-code-forged-mlx-4bit	+3.5%	15GB	MacBook
qwen3.5-27b-code-forged	+3.5%	30GB	CUDA
qwen3.5-4b-code-forged	+24.0%	8GB	Any

Code

sentinel-ai — the forging framework
continuum — distributed AI on consumer hardware

Key Findings

Domain-specific training amplifies plasticity — code data produces 24% improvement vs 14.6% on generic text
The transfer function is predictable — recovery follows an exponential decay, enabling self-directed pruning
Continuous defrag accelerates training — structurally removing dead heads frees VRAM for larger batches
72% compression with quality improvement — 54GB → 15GB while being better at code

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support