Document inference-slim checkpoint
Browse files
README.md
CHANGED
|
@@ -1,34 +1,104 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
`
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: pytorch
|
| 3 |
+
tags:
|
| 4 |
+
- agillm
|
| 5 |
+
- transformer
|
| 6 |
+
- diffusion-block
|
| 7 |
+
- single-file
|
| 8 |
+
license: other
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# AGILLM3.5 Single File
|
| 12 |
+
|
| 13 |
+
AGILLM3.5 is the AGILLM3 checkpoint/tokenizer contract running on the AGILLM4 runtime and DiffusionBlock training path.
|
| 14 |
+
|
| 15 |
+
The runnable artifact is `agillm35.py`. The helper modules are folded into that one file so the runtime can be cloned, inspected, and launched without restoring the whole AGILLM4 source tree.
|
| 16 |
+
|
| 17 |
+
## Public Join Scripts
|
| 18 |
+
|
| 19 |
+
`public_join/agillm35_network_host.py` starts a signed-lease HTTPS coordinator for people who want to run their own network.
|
| 20 |
+
|
| 21 |
+
`public_join/agillm35_join_worker.py` is an outbound-only worker for untrusted joiners. It requests short-lived leases, verifies package hashes, runs a local worker command, and submits results to quarantine rather than exposing SSH or writing directly into the master merge path.
|
| 22 |
+
|
| 23 |
+
## Distributed Inference
|
| 24 |
+
|
| 25 |
+
`distributed_infer/agillm35_distributed_infer.py` is a single-file distributed AR inference harness for the real AGILLM3.5 transformer. It splits contiguous transformer/DiffusionBlock layer ranges across local or HTTP worker stages, using the actual `Block` implementation and MoE FFNs from the checkpoint config.
|
| 26 |
+
|
| 27 |
+
Plan layer ranges:
|
| 28 |
+
|
| 29 |
+
```bash
|
| 30 |
+
python distributed_infer/agillm35_distributed_infer.py plan \
|
| 31 |
+
--agillm35-path ./agillm35.py \
|
| 32 |
+
--ckpt /path/to/master.pt \
|
| 33 |
+
--dblock-blocks 8
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
Start a worker for one layer range:
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
AGILLM35_INFER_TOKEN='change-me' python distributed_infer/agillm35_distributed_infer.py worker \
|
| 40 |
+
--agillm35-path ./agillm35.py \
|
| 41 |
+
--ckpt /path/to/master.pt \
|
| 42 |
+
--start-layer 0 \
|
| 43 |
+
--end-layer 12 \
|
| 44 |
+
--host 0.0.0.0 \
|
| 45 |
+
--port 9100
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
Run the coordinator:
|
| 49 |
+
|
| 50 |
+
```bash
|
| 51 |
+
AGILLM35_INFER_TOKEN='change-me' python distributed_infer/agillm35_distributed_infer.py infer \
|
| 52 |
+
--agillm35-path ./agillm35.py \
|
| 53 |
+
--ckpt /path/to/master.pt \
|
| 54 |
+
--prompt "Hello" \
|
| 55 |
+
--max-new 32 \
|
| 56 |
+
--cache-mode kv \
|
| 57 |
+
--stage https://worker-a.example:9100,0,12 \
|
| 58 |
+
--stage local:12:24
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
Network tensor payloads use a small raw tensor wire format rather than unpickling remote worker responses. Use TLS plus a bearer token for workers exposed beyond localhost. `--cache-mode kv` is the default and keeps per-session KV state on each worker after the prompt prefill, so decode steps send only the new hidden token through the pipeline. `--cache-mode full` is kept for comparison/debugging. SAT/NAT distributed decoding is a later phase.
|
| 62 |
|
| 63 |
+
For inference against the live round-299 checkpoint, prefer the HF inference-slim artifact `distributed/inference/master_r299_20260602-205914_ar_infer_slim.pt`; it drops optimizer/SAT/disaggregated training state while preserving AR transformer inference.
|
| 64 |
+
|
| 65 |
+
## Defaults
|
| 66 |
+
|
| 67 |
+
- tokenizer: `deepseek-ai/DeepSeek-V3.2`
|
| 68 |
+
- preset: `large` (`d=1024`, `layers=24`, `heads=16`, `rank=128`)
|
| 69 |
+
- compatibility mode: `--agillm3_compat`
|
| 70 |
+
- NAT head/objective: disabled for AGILLM3 checkpoint compatibility
|
| 71 |
+
- DiffusionBlocks: available with `--dblock`
|
| 72 |
+
|
| 73 |
+
## Commands
|
| 74 |
+
|
| 75 |
+
```bash
|
| 76 |
+
python agillm35.py --help
|
| 77 |
+
python agillm35.py status --ckpt /path/to/pretrain_step00051081.pt
|
| 78 |
+
python agillm35.py infer --ckpt /path/to/pretrain_step00051081.pt --prompt "Hello"
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## Example
|
| 82 |
+
|
| 83 |
+
```bash
|
| 84 |
+
python agillm35.py train \
|
| 85 |
+
--agillm3_compat \
|
| 86 |
+
--preset large \
|
| 87 |
+
--resume /path/to/pretrain_step00051081.pt \
|
| 88 |
+
--block 512 \
|
| 89 |
+
--batch_size 1 \
|
| 90 |
+
--source HuggingFaceFW/fineweb-edu \
|
| 91 |
+
--save_dir ckpts \
|
| 92 |
+
--dblock \
|
| 93 |
+
--dblock_blocks 8 \
|
| 94 |
+
--nat_every 0 \
|
| 95 |
+
--dblock_nat_weight 0
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
## Notes
|
| 99 |
+
|
| 100 |
+
This repository contains code only, not AGILLM3 checkpoint weights.
|
| 101 |
+
|
| 102 |
+
DiffusionBlock logs report raw CE-style `loss` plus the actual EDM-weighted training objective as `weighted`. The weighted value is the optimization target; the raw value is the sanity-check number to compare with ordinary AR/SAT loss.
|
| 103 |
+
|
| 104 |
+
The Linux smoke test compiles the single file and completes a one-step synthetic training save. The full AGILLM3.5 continuation run is managed separately by the disaggregated Hetzner worker setup.
|