AGILLM-3-large / README.md
OpenTransformer's picture
Add README with model details
eb981c3 verified

AGILLM-3 Large (698M)

AR+SAT Joint Training — Novel architecture training both autoregressive and semi-autoregressive heads simultaneously, enabling faster parallel inference.

Model Details

Parameter Value
Parameters 698M
Architecture Transformer with Expansion Rank
d_model 1024
Layers 24
Heads 16
Expansion Rank 128 (2x ratio)
Tokenizer DeepSeek-V3.2 (128,815 vocab)
Training Target 35.76B tokens (51.2x Chinchilla)
Context Length 1122 tokens

Training

# Minimal run (uses sane defaults)
python n.py train --preset large

# Resume from checkpoint
python n.py train --preset large --resume ckpts/latest.pt

# Inference
python n.py infer --mode ar --ckpt ckpts/pretrain_step00176907.pt --prompt "Hello" --max_new 100

Defaults Baked In

  • --max_ckpts 3 — Auto-prune old checkpoints
  • --chilla_max_double True — Double Chinchilla (51.2x tokens)
  • --after_sft_steps 80000 — 80K SFT steps with chat format
  • Auto HF upload on each checkpoint save

Hot Config

Edit hot_config.json mid-training without restart:

{"save_every_sec": 43200, "pause_training": false}

Files

  • n.py — Main trainer with AR+SAT joint training
  • rotating_log.py — Dual rotating log
  • hf_upload.py — Checkpoint uploader
  • tokenizer/ — DeepSeek-V3.2 tokenizer

License

Apache 2.0

Author

OpenTransformers Ltd (UK Company #16940923)