# AGILLM-3 Large (698M)

**AR+SAT Joint Training** — Novel architecture training both autoregressive and semi-autoregressive heads simultaneously, enabling faster parallel inference.

## Model Details

| Parameter | Value |
|-----------|-------|
| Parameters | 698M |
| Architecture | Transformer with Expansion Rank |
| d_model | 1024 |
| Layers | 24 |
| Heads | 16 |
| Expansion Rank | 128 (2x ratio) |
| Tokenizer | DeepSeek-V3.2 (128,815 vocab) |
| Training Target | 35.76B tokens (51.2x Chinchilla) |
| Context Length | 1122 tokens |

## Training

```bash
# Minimal run (uses sane defaults)
python n.py train --preset large

# Resume from checkpoint
python n.py train --preset large --resume ckpts/latest.pt

# Inference
python n.py infer --mode ar --ckpt ckpts/pretrain_step00176907.pt --prompt "Hello" --max_new 100
```

## Defaults Baked In

- `--max_ckpts 3` — Auto-prune old checkpoints
- `--chilla_max_double True` — Double Chinchilla (51.2x tokens)  
- `--after_sft_steps 80000` — 80K SFT steps with chat format
- Auto HF upload on each checkpoint save

## Hot Config

Edit `hot_config.json` mid-training without restart:
```json
{"save_every_sec": 43200, "pause_training": false}
```

## Files

- `n.py` — Main trainer with AR+SAT joint training
- `rotating_log.py` — Dual rotating log
- `hf_upload.py` — Checkpoint uploader
- `tokenizer/` — DeepSeek-V3.2 tokenizer

## License

Apache 2.0

## Author

OpenTransformers Ltd (UK Company #16940923)