AGILLM-3-large / README.md
OpenTransformer's picture
Add README with model details
eb981c3 verified
# AGILLM-3 Large (698M)
**AR+SAT Joint Training** β€” Novel architecture training both autoregressive and semi-autoregressive heads simultaneously, enabling faster parallel inference.
## Model Details
| Parameter | Value |
|-----------|-------|
| Parameters | 698M |
| Architecture | Transformer with Expansion Rank |
| d_model | 1024 |
| Layers | 24 |
| Heads | 16 |
| Expansion Rank | 128 (2x ratio) |
| Tokenizer | DeepSeek-V3.2 (128,815 vocab) |
| Training Target | 35.76B tokens (51.2x Chinchilla) |
| Context Length | 1122 tokens |
## Training
```bash
# Minimal run (uses sane defaults)
python n.py train --preset large
# Resume from checkpoint
python n.py train --preset large --resume ckpts/latest.pt
# Inference
python n.py infer --mode ar --ckpt ckpts/pretrain_step00176907.pt --prompt "Hello" --max_new 100
```
## Defaults Baked In
- `--max_ckpts 3` β€” Auto-prune old checkpoints
- `--chilla_max_double True` β€” Double Chinchilla (51.2x tokens)
- `--after_sft_steps 80000` β€” 80K SFT steps with chat format
- Auto HF upload on each checkpoint save
## Hot Config
Edit `hot_config.json` mid-training without restart:
```json
{"save_every_sec": 43200, "pause_training": false}
```
## Files
- `n.py` β€” Main trainer with AR+SAT joint training
- `rotating_log.py` β€” Dual rotating log
- `hf_upload.py` β€” Checkpoint uploader
- `tokenizer/` β€” DeepSeek-V3.2 tokenizer
## License
Apache 2.0
## Author
OpenTransformers Ltd (UK Company #16940923)