Project Janus Part II: Mechanistic Validation of Orthogonal Regularization in Nano-Scale Language Models via Sparse Autoencoder Analysis
Abstract from the corresponding paper:
Prevailing scaling laws assume that parameter efficiency is an immutable architectural property, implying that small language models must accept a baseline level of representational redundancy. We challenge this assumption by addressing "Rank Collapse," a phenomenon where attention heads converge on overlapping features, severely constraining the expressive capacity of nano-scale models. While prior work established that Vector Space Homeostasis (VSM) improves logical coherence by 9.2% in 40M parameter models, the underlying mechanism remained opaque. This paper provides the first mechanistic validation of orthogonal regularization, utilizing Sparse Autoencoders (SAEs), topology analysis, and causal ablation to decompose the model's internal dynamics. We report three fundamental structural shifts induced by VSM: (1) a 60.3% reduction in inter-head correlation (Subspace Disentanglement), (2) a novel "sparsity crossover" phenomenon where early layers optimize for selectivity while deep layers maximize density, and (3) a 2.54x increase in the functional specialization of load-bearing heads. These findings not only validate the "Super-Chinchilla" hypothesis—that effective rank is a proxy for parameter count—but also establish a paradigm of prescriptive interpretability, where desirable internal structure is enforced by design rather than discovered post-hoc.



