OpenTransformer
/

AGILLM-3-large

OpenTransformer commited on Jan 9

Commit

eb981c3

verified ·

1 Parent(s): 9bdce90

Add README with model details

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,51 +1,59 @@
----
-license: mit
-tags:
-  - pytorch
-  - transformer
-  - language-model
-  - agillm
-  - ar-sat
-  - joint-training
----
 # AGILLM-3 Large (698M)
-A 698 million parameter language model trained using novel **AR+SAT joint training** - combining autoregressive and semi-autoregressive objectives in a single forward pass.
-## Architecture
 | Parameter | Value |
 |-----------|-------|
 | d_model | 1024 |
-| layers | 24 |
-| heads | 16 |
-| rank | 128 |
-| total params | 698,389,088 |
 ## Training
-- **Dataset:** OpenWebText + WikiText
-- **Target:** 2.7M pretrain steps + 300k SFT steps
-- **Hardware:** RTX 3090 24GB
-- **Framework:** Custom PyTorch trainer
-## Checkpoints
-Milestone checkpoints saved every 100k steps:
-- `checkpoints/step_100000.pt`
-- `checkpoints/step_200000.pt`
-- ... etc
-## Research Hypothesis
-Joint AR+SAT training provides ~2x learning efficiency compared to isolated training. The SAT decoder's parallel prediction forces holistic understanding while AR maintains autoregressive generation capability.
-## Author
-**OpenTransformers Ltd** (Company #16940923)
-Scott Edwards - Founder/Director
 ## License
-MIT

 # AGILLM-3 Large (698M)
+**AR+SAT Joint Training** — Novel architecture training both autoregressive and semi-autoregressive heads simultaneously, enabling faster parallel inference.
+## Model Details
 | Parameter | Value |
 |-----------|-------|
+| Parameters | 698M |
+| Architecture | Transformer with Expansion Rank |
 | d_model | 1024 |
+| Layers | 24 |
+| Heads | 16 |
+| Expansion Rank | 128 (2x ratio) |
+| Tokenizer | DeepSeek-V3.2 (128,815 vocab) |
+| Training Target | 35.76B tokens (51.2x Chinchilla) |
+| Context Length | 1122 tokens |
 ## Training
+```bash
+# Minimal run (uses sane defaults)
+python n.py train --preset large
+# Resume from checkpoint
+python n.py train --preset large --resume ckpts/latest.pt
+# Inference
+python n.py infer --mode ar --ckpt ckpts/pretrain_step00176907.pt --prompt "Hello" --max_new 100
+```
+## Defaults Baked In
+- `--max_ckpts 3` — Auto-prune old checkpoints
+- `--chilla_max_double True` — Double Chinchilla (51.2x tokens)
+- `--after_sft_steps 80000` — 80K SFT steps with chat format
+- Auto HF upload on each checkpoint save
+## Hot Config
+Edit `hot_config.json` mid-training without restart:
+```json
+{"save_every_sec": 43200, "pause_training": false}
+```
+## Files
+- `n.py` — Main trainer with AR+SAT joint training
+- `rotating_log.py` — Dual rotating log
+- `hf_upload.py` — Checkpoint uploader
+- `tokenizer/` — DeepSeek-V3.2 tokenizer
 ## License
+Apache 2.0
+## Author
+OpenTransformers Ltd (UK Company #16940923)