EnricoFermi's picture
Upload README.md with huggingface_hub
1ce4955 verified
metadata
tags:
  - research
  - pruning
  - experiential-plasticity
  - neural-architecture

Experiential Plasticity: Transformers That Grow Their Own Architecture From Experience

Joel Teply — continuum-ai, Kansas City

Abstract

Iterative entropy-based pruning with domain-specific retraining produces transformers that are both smaller and more capable. Qwen3.5-4B achieves +24% improvement on code when forged with CodeFeedback data. Qwen3.5-27B achieves +3.5% improvement while compressing from 54GB to 15GB — running on a MacBook at 9 tok/s with Sonnet 4.6-level intelligence.

We discover a measurable transfer function (1.45·exp(−0.18·cycle) − 0.03) connecting transformer optimization to classical control theory, and introduce continuous defrag — structural head removal that accelerates training by 40%.

Paper

📄 PDF 📝 Full paper on GitHub

Published Models

Model Improvement Size Hardware
qwen3.5-27b-code-forged-mlx-4bit +3.5% 15GB MacBook
qwen3.5-27b-code-forged +3.5% 30GB CUDA
qwen3.5-4b-code-forged +24.0% 8GB Any

Code

Key Findings

  1. Domain-specific training amplifies plasticity — code data produces 24% improvement vs 14.6% on generic text
  2. The transfer function is predictable — recovery follows an exponential decay, enabling self-directed pruning
  3. Continuous defrag accelerates training — structurally removing dead heads frees VRAM for larger batches
  4. 72% compression with quality improvement — 54GB → 15GB while being better at code