---
library_name: pytorch
tags:
- transformer
- language-model
- long-context
- agillm
- experimental
---

# AGILLM-4

AGILLM-4 is the next training target after AGILLM-3. The current code is a
production-oriented starting point, copied from the proven single-file trainer
and extended for:

- ~1.5B parameter main preset (`agillm4_main`)
- 100 tokens per parameter target ratio
- longer block-size work on 24GB, B200, and B300 class GPUs
- AR+SAT every step with sequential backward to reduce peak VRAM
- SDPA and experimental sublinear local+landmark attention backends
- exact M-fold expansion attention harvested from n1.py, with local verifier
- fused QKV projection harvested from n1.py, with legacy checkpoint loading
- profiling tools for memory, throughput, AR cost, SAT cost, and optimizer cost
- synthetic long-context curriculum generation for recall and multi-hop tests

Start with [AGILLM-4.md](AGILLM-4.md) for the training plan and command
recipes. The current sublinear backend is intentionally experimental: profile it
against SDPA before using it for a real run.

Current harvest status from n1.py is tracked in [N1_HARVEST.md](N1_HARVEST.md).