File size: 2,029 Bytes
9c63689 60f0b3c c2b5995 9c63689 528e0d6 9c63689 421314d 18b3e9e 9c63689 421314d c2b5995 269c08f 5c06a05 c2b5995 269c08f 421314d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ---
library_name: pytorch
tags:
- transformer
- language-model
- long-context
- agillm
- experimental
---
# AGILLM-4
AGILLM-4 is the next training target after AGILLM-3. The current code is a
production-oriented starting point, copied from the proven single-file trainer
and extended for:
- >1B parameter floor preset (`agillm4_floor`) and ~1.7B main preset (`agillm4_main`) with AR+SAT+NAT heads
- 100 tokens per parameter target ratio, above the AGILLM-3 training ratio
- longer block-size work on 24GB, B200, and B300 class GPUs
- AR+SAT+NAT training, with sequential backward to reduce peak VRAM
- SDPA and experimental sublinear local+landmark attention backends
- exact M-fold expansion attention harvested from n1.py, with local verifier
- fused QKV projection harvested from n1.py, with legacy checkpoint loading
- profiling tools for memory, throughput, AR cost, SAT cost, and optimizer cost
- synthetic long-context curriculum generation for recall and multi-hop tests
Start with [AGILLM-4.md](AGILLM-4.md) for the training plan and command
recipes. The current sublinear backend is intentionally experimental: profile it
against SDPA before using it for a real run.
On RTX 4090-class 24GB cards, `run_agillm4_4090_longblock.sh` now defaults to
`agillm4_floor` instead of the AGILLM-3-sized `large` preset, starts at block
`1280`, and backs off in smaller 20% steps if VRAM is too tight.
For the current v47 seed, launch tmux with
`/workspace/agillm-4/launch_agillm4_4090_floor_from_v47.sh`; it writes
`/workspace/agillm4_floor_train.log`.
Checkpoint upload policy is intentionally bounded for the public HF storage
quota: status and log tails upload every 30 minutes, the latest multi-GB delta
uploads at most daily, and full checkpoints upload at most weekly with only two
current remote files retained. Local full saves default to daily and local
retention is one full plus one delta, so the 64GB Vast disk does not slowly fill.
Current harvest status from n1.py is tracked in [N1_HARVEST.md](N1_HARVEST.md).
|