Loss Is Not Enough: The Golden Window in Neural Network Training

Song Yue · Independent Researcher · July 2026

arXiv Tool License

Abstract

Training and fine-tuning neural networks proceeds through three structurally distinct phases—Build, Collapse, and Rebuild—that the standard optimization metric, loss, is blind to. During Collapse, structural order drops 49% while loss continues to improve. Phase Structure, one of four complementary diagnostics from FPP, locates the golden window where capability peaks before being sacrificed for marginal gains. Across 13 models spanning 7.5M-14B parameters and 5 architecture families, Green Symmetry varies 2× and Mutual Information 40× within the same family. Momentum injection produces three distinct response modes: mild (+19%, SwiGLU+MHA), strong (+32%, ReLU), and structurally dangerous (GeGLU, unsafe beyond β=0.02). All experiments conducted on consumer hardware (GTX 1650 Ti 4GB + Intel Ultra 30GB). No cloud. No cluster.

Quick Start

pip install torch transformers numpy scipy scikit-learn sentencepiece accelerate
python fpp_health.py --model Qwen/Qwen2.5-0.5B-Instruct

Key Findings

  • Three-phase lifecycle: Build → Collapse → Rebuild (loss sees one, FPP sees three)
  • GS is U-shaped: peaks at 1.5B, dips at 7B, recovers at 14B
  • MI varies 40×: TinyLlama (0.771) has 11× more information capacity than Qwen (0.069)
  • Phase is architecture-stable: 0.18–0.47 across all models
  • β safety is family-specific: SwiGLU+MHA [0.05,0.20], GeGLU [0.001,0.02], SmolLM2-1.7B NONE
  • All on consumer hardware: 1650 Ti (≤1.7B) + Intel Ultra (7–14B)

Models Evaluated

Model Family GS MI Phase Safe β
Qwen2.5-0.5B SwiGLU+MHA 0.81 0.09 0.42 [0.05,0.20]
Qwen2.5-1.5B SwiGLU+MHA 0.89 0.10 0.32 [0.05,0.20]
Qwen2.5-7B SwiGLU+MHA 0.81 0.06 0.40 [0.01,0.50]
Qwen2.5-14B SwiGLU+MHA 0.89 0.08 0.29 [0.01,0.20]
SmolLM2-360M SwiGLU+GQA 0.89 0.07 0.31 [0.01,0.02]
SmolLM2-1.7B SwiGLU+GQA 0.43 0.22 0.47 NONE
TinyLlama-1.1B SwiGLU+GQA 0.80 0.77 0.39 [0.01,0.02]
Gemma-3-1B GeGLU 0.28 0.10 0.31 [0.001,0.02]
Gemma-2-9B GeGLU 0.91 0.05 0.06 [0.01,0.05]
OPT-1.3B ReLU 0.60 0.14 0.21 [0.01,0.20]
Pythia-160M GELU 0.46 0.20 0.35 [0.01,0.02]

Citation

@article{yue2026loss,
  title={Loss Is Not Enough: The Golden Window in Neural Network Training},
  author={Yue, Song},
  year={2026},
  eprint={pending},
  archivePrefix={arXiv}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support