Loss Is Not Enough: The Golden Window in Neural Network Training

Song Yue · Independent Researcher · July 2026

Abstract

Training and fine-tuning neural networks proceeds through three structurally distinct phases—Build, Collapse, and Rebuild—that the standard optimization metric, loss, is blind to. During Collapse, structural order drops 49% while loss continues to improve. Phase Structure, one of four complementary diagnostics from FPP, locates the golden window where capability peaks before being sacrificed for marginal gains. Across 13 models spanning 7.5M-14B parameters and 5 architecture families, Green Symmetry varies 2× and Mutual Information 40× within the same family. Momentum injection produces three distinct response modes: mild (+19%, SwiGLU+MHA), strong (+32%, ReLU), and structurally dangerous (GeGLU, unsafe beyond β=0.02). All experiments conducted on consumer hardware (GTX 1650 Ti 4GB + Intel Ultra 30GB). No cloud. No cluster.

Quick Start

pip install torch transformers numpy scipy scikit-learn sentencepiece accelerate
python fpp_health.py --model Qwen/Qwen2.5-0.5B-Instruct

Key Findings

Three-phase lifecycle: Build → Collapse → Rebuild (loss sees one, FPP sees three)
GS is U-shaped: peaks at 1.5B, dips at 7B, recovers at 14B
MI varies 40×: TinyLlama (0.771) has 11× more information capacity than Qwen (0.069)
Phase is architecture-stable: 0.18–0.47 across all models
β safety is family-specific: SwiGLU+MHA [0.05,0.20], GeGLU [0.001,0.02], SmolLM2-1.7B NONE
All on consumer hardware: 1650 Ti (≤1.7B) + Intel Ultra (7–14B)

Models Evaluated

Model	Family	GS	MI	Phase	Safe β
Qwen2.5-0.5B	SwiGLU+MHA	0.81	0.09	0.42	[0.05,0.20]
Qwen2.5-1.5B	SwiGLU+MHA	0.89	0.10	0.32	[0.05,0.20]
Qwen2.5-7B	SwiGLU+MHA	0.81	0.06	0.40	[0.01,0.50]
Qwen2.5-14B	SwiGLU+MHA	0.89	0.08	0.29	[0.01,0.20]
SmolLM2-360M	SwiGLU+GQA	0.89	0.07	0.31	[0.01,0.02]
SmolLM2-1.7B	SwiGLU+GQA	0.43	0.22	0.47	NONE
TinyLlama-1.1B	SwiGLU+GQA	0.80	0.77	0.39	[0.01,0.02]
Gemma-3-1B	GeGLU	0.28	0.10	0.31	[0.001,0.02]
Gemma-2-9B	GeGLU	0.91	0.05	0.06	[0.01,0.05]
OPT-1.3B	ReLU	0.60	0.14	0.21	[0.01,0.20]
Pythia-160M	GELU	0.46	0.20	0.35	[0.01,0.02]

Citation

@article{yue2026loss,
  title={Loss Is Not Enough: The Golden Window in Neural Network Training},
  author={Yue, Song},
  year={2026},
  eprint={pending},
  archivePrefix={arXiv}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support