AGILLM-4

AGILLM-4 is the next training target after AGILLM-3. The current code is a production-oriented starting point, copied from the proven single-file trainer and extended for:

  • ~1.5B parameter main preset (agillm4_main)
  • 100 tokens per parameter target ratio
  • longer block-size work on 24GB, B200, and B300 class GPUs
  • AR+SAT every step with sequential backward to reduce peak VRAM
  • SDPA and experimental sublinear local+landmark attention backends
  • exact M-fold expansion attention harvested from n1.py, with local verifier
  • fused QKV projection harvested from n1.py, with legacy checkpoint loading
  • profiling tools for memory, throughput, AR cost, SAT cost, and optimizer cost
  • synthetic long-context curriculum generation for recall and multi-hop tests

Start with AGILLM-4.md for the training plan and command recipes. The current sublinear backend is intentionally experimental: profile it against SDPA before using it for a real run.

Current harvest status from n1.py is tracked in N1_HARVEST.md.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support