# AGILLM-3 Large (698M) **AR+SAT Joint Training** — Novel architecture training both autoregressive and semi-autoregressive heads simultaneously, enabling faster parallel inference. ## Model Details | Parameter | Value | |-----------|-------| | Parameters | 698M | | Architecture | Transformer with Expansion Rank | | d_model | 1024 | | Layers | 24 | | Heads | 16 | | Expansion Rank | 128 (2x ratio) | | Tokenizer | DeepSeek-V3.2 (128,815 vocab) | | Training Target | 35.76B tokens (51.2x Chinchilla) | | Context Length | 1122 tokens | ## Training ```bash # Minimal run (uses sane defaults) python n.py train --preset large # Resume from checkpoint python n.py train --preset large --resume ckpts/latest.pt # Inference python n.py infer --mode ar --ckpt ckpts/pretrain_step00176907.pt --prompt "Hello" --max_new 100 ``` ## Defaults Baked In - `--max_ckpts 3` — Auto-prune old checkpoints - `--chilla_max_double True` — Double Chinchilla (51.2x tokens) - `--after_sft_steps 80000` — 80K SFT steps with chat format - Auto HF upload on each checkpoint save ## Hot Config Edit `hot_config.json` mid-training without restart: ```json {"save_every_sec": 43200, "pause_training": false} ``` ## Files - `n.py` — Main trainer with AR+SAT joint training - `rotating_log.py` — Dual rotating log - `hf_upload.py` — Checkpoint uploader - `tokenizer/` — DeepSeek-V3.2 tokenizer ## License Apache 2.0 ## Author OpenTransformers Ltd (UK Company #16940923)