--- library_name: pytorch tags: - transformer - language-model - long-context - agillm - experimental --- # AGILLM-4 AGILLM-4 is the next training target after AGILLM-3. The current code is a production-oriented starting point, copied from the proven single-file trainer and extended for: - ~1.5B parameter main preset (`agillm4_main`) - 100 tokens per parameter target ratio - longer block-size work on 24GB, B200, and B300 class GPUs - AR+SAT every step with sequential backward to reduce peak VRAM - SDPA and experimental sublinear local+landmark attention backends - exact M-fold expansion attention harvested from n1.py, with local verifier - fused QKV projection harvested from n1.py, with legacy checkpoint loading - profiling tools for memory, throughput, AR cost, SAT cost, and optimizer cost - synthetic long-context curriculum generation for recall and multi-hop tests Start with [AGILLM-4.md](AGILLM-4.md) for the training plan and command recipes. The current sublinear backend is intentionally experimental: profile it against SDPA before using it for a real run. Current harvest status from n1.py is tracked in [N1_HARVEST.md](N1_HARVEST.md).