AGILLM-4

AGILLM-4 is the next training target after AGILLM-3. The current code is a production-oriented starting point, copied from the proven single-file trainer and extended for:

1B parameter floor preset (agillm4_floor) and ~1.7B main preset (agillm4_main) with AR+SAT+NAT heads
100 tokens per parameter target ratio, above the AGILLM-3 training ratio
longer block-size work on 24GB, B200, and B300 class GPUs
AR+SAT+NAT training, with sequential backward to reduce peak VRAM
SDPA and experimental sublinear local+landmark attention backends
exact M-fold expansion attention harvested from n1.py, with local verifier
fused QKV projection harvested from n1.py, with legacy checkpoint loading
profiling tools for memory, throughput, AR cost, SAT cost, and optimizer cost
synthetic long-context curriculum generation for recall and multi-hop tests

Start with AGILLM-4.md for the training plan and command recipes. The current sublinear backend is intentionally experimental: profile it against SDPA before using it for a real run.

On RTX 4090-class 24GB cards, run_agillm4_4090_longblock.sh now defaults to agillm4_floor instead of the AGILLM-3-sized large preset, starts at block 1280, and backs off in smaller 20% steps if VRAM is too tight. For the current v47 seed, launch tmux with /workspace/agillm-4/launch_agillm4_4090_floor_from_v47.sh; it writes /workspace/agillm4_floor_train.log.

Checkpoint upload policy is intentionally bounded for the public HF storage quota: status and log tails upload every 30 minutes, the latest multi-GB delta uploads at most daily, and full checkpoints upload at most weekly with only two current remote files retained. Local full saves default to daily and local retention is one full plus one delta, so the 64GB Vast disk does not slowly fill.

Current harvest status from n1.py is tracked in N1_HARVEST.md.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support