OpenTransformer commited on
Commit
7814939
·
verified ·
1 Parent(s): 66ed57c

Document inference-slim checkpoint

Browse files
Files changed (1) hide show
  1. README.md +103 -33
README.md CHANGED
@@ -1,34 +1,104 @@
1
- # AGILLM-3.5
2
-
3
- 698M-param decoder LM (d=1024, 24 layers, 16 heads, rank=128, expansion 2.0×), DeepSeek-V3.2
4
- tokenizer, AR + SAT (speculative) heads, trained with **DiffusionBlocks** (block-wise EDM
5
- denoising, 8 blocks). Forked from the AGILLM-3 ~step-51081 base.
6
-
7
- ## Checkpoints
8
-
9
- ### `distributed/` — the live distributed model
10
- `master_round244.pt` — trained **block-disaggregated across 4 Hetzner nodes** (GETH +
11
- MCP + PRIME + COMMUNIST-WEB). GETH coordinates and trains blocks 0,1,3,5,7 locally; MCP/PRIME/
12
- COMMUNIST-WEB train blocks 2/4/6 over the private network. Each round exports block slices,
13
- trains them independently, and merges them back. **739 merged block-updates** at snapshot.
14
- Single full file per snapshot (each round is a block merge, not a delta).
15
-
16
- ### `single_node/` — single-node dblock lineage (full + delta)
17
- - `pretrain_step00053908.pt` full checkpoint (7.3 GB)
18
- - `pretrain_delta_step00053702.pt` (+ `.sha256`) — delta checkpoint (2.8 GB)
19
-
20
- Checkpoint dict keys: `core` (backbone), `ar`, `sat` (heads), `cfg`, embedded
21
- `tokenizer_json`, plus `disagg_updates` (merge provenance) on the distributed master.
22
-
23
-
24
- ## Code
25
-
26
- - `agillm35.py` - single-file AGILLM3.5 runtime for training/status/inference.
27
- - `distributed/public_join/` - public signed-lease host and outbound worker scripts for untrusted joiners.
28
- - `distributed/inference/agillm35_distributed_infer.py` - phase-1 distributed AR inference harness for transformer/MoE/DiffusionBlock layer stages.
29
-
30
- ## Inference
31
- Load with the AGILLM nB300 code (`infer --mode ar|sat`); the tokenizer round-trips from the
32
- embedded `tokenizer_json`.
33
- Distributed AR inference can split contiguous transformer/DiffusionBlock layer ranges across local and HTTP worker stages. `--cache-mode kv` is the default and keeps per-session KV state on workers after prompt prefill, so decode steps send only the new hidden token through the pipeline. The network payload path uses a raw tensor wire format rather than unpickling remote worker responses; use TLS and a bearer token outside localhost.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: pytorch
3
+ tags:
4
+ - agillm
5
+ - transformer
6
+ - diffusion-block
7
+ - single-file
8
+ license: other
9
+ ---
10
+
11
+ # AGILLM3.5 Single File
12
+
13
+ AGILLM3.5 is the AGILLM3 checkpoint/tokenizer contract running on the AGILLM4 runtime and DiffusionBlock training path.
14
+
15
+ The runnable artifact is `agillm35.py`. The helper modules are folded into that one file so the runtime can be cloned, inspected, and launched without restoring the whole AGILLM4 source tree.
16
+
17
+ ## Public Join Scripts
18
+
19
+ `public_join/agillm35_network_host.py` starts a signed-lease HTTPS coordinator for people who want to run their own network.
20
+
21
+ `public_join/agillm35_join_worker.py` is an outbound-only worker for untrusted joiners. It requests short-lived leases, verifies package hashes, runs a local worker command, and submits results to quarantine rather than exposing SSH or writing directly into the master merge path.
22
+
23
+ ## Distributed Inference
24
+
25
+ `distributed_infer/agillm35_distributed_infer.py` is a single-file distributed AR inference harness for the real AGILLM3.5 transformer. It splits contiguous transformer/DiffusionBlock layer ranges across local or HTTP worker stages, using the actual `Block` implementation and MoE FFNs from the checkpoint config.
26
+
27
+ Plan layer ranges:
28
+
29
+ ```bash
30
+ python distributed_infer/agillm35_distributed_infer.py plan \
31
+ --agillm35-path ./agillm35.py \
32
+ --ckpt /path/to/master.pt \
33
+ --dblock-blocks 8
34
+ ```
35
+
36
+ Start a worker for one layer range:
37
+
38
+ ```bash
39
+ AGILLM35_INFER_TOKEN='change-me' python distributed_infer/agillm35_distributed_infer.py worker \
40
+ --agillm35-path ./agillm35.py \
41
+ --ckpt /path/to/master.pt \
42
+ --start-layer 0 \
43
+ --end-layer 12 \
44
+ --host 0.0.0.0 \
45
+ --port 9100
46
+ ```
47
+
48
+ Run the coordinator:
49
+
50
+ ```bash
51
+ AGILLM35_INFER_TOKEN='change-me' python distributed_infer/agillm35_distributed_infer.py infer \
52
+ --agillm35-path ./agillm35.py \
53
+ --ckpt /path/to/master.pt \
54
+ --prompt "Hello" \
55
+ --max-new 32 \
56
+ --cache-mode kv \
57
+ --stage https://worker-a.example:9100,0,12 \
58
+ --stage local:12:24
59
+ ```
60
+
61
+ Network tensor payloads use a small raw tensor wire format rather than unpickling remote worker responses. Use TLS plus a bearer token for workers exposed beyond localhost. `--cache-mode kv` is the default and keeps per-session KV state on each worker after the prompt prefill, so decode steps send only the new hidden token through the pipeline. `--cache-mode full` is kept for comparison/debugging. SAT/NAT distributed decoding is a later phase.
62
 
63
+ For inference against the live round-299 checkpoint, prefer the HF inference-slim artifact `distributed/inference/master_r299_20260602-205914_ar_infer_slim.pt`; it drops optimizer/SAT/disaggregated training state while preserving AR transformer inference.
64
+
65
+ ## Defaults
66
+
67
+ - tokenizer: `deepseek-ai/DeepSeek-V3.2`
68
+ - preset: `large` (`d=1024`, `layers=24`, `heads=16`, `rank=128`)
69
+ - compatibility mode: `--agillm3_compat`
70
+ - NAT head/objective: disabled for AGILLM3 checkpoint compatibility
71
+ - DiffusionBlocks: available with `--dblock`
72
+
73
+ ## Commands
74
+
75
+ ```bash
76
+ python agillm35.py --help
77
+ python agillm35.py status --ckpt /path/to/pretrain_step00051081.pt
78
+ python agillm35.py infer --ckpt /path/to/pretrain_step00051081.pt --prompt "Hello"
79
+ ```
80
+
81
+ ## Example
82
+
83
+ ```bash
84
+ python agillm35.py train \
85
+ --agillm3_compat \
86
+ --preset large \
87
+ --resume /path/to/pretrain_step00051081.pt \
88
+ --block 512 \
89
+ --batch_size 1 \
90
+ --source HuggingFaceFW/fineweb-edu \
91
+ --save_dir ckpts \
92
+ --dblock \
93
+ --dblock_blocks 8 \
94
+ --nat_every 0 \
95
+ --dblock_nat_weight 0
96
+ ```
97
+
98
+ ## Notes
99
+
100
+ This repository contains code only, not AGILLM3 checkpoint weights.
101
+
102
+ DiffusionBlock logs report raw CE-style `loss` plus the actual EDM-weighted training objective as `weighted`. The weighted value is the optimization target; the raw value is the sanity-check number to compare with ordinary AR/SAT loss.
103
+
104
+ The Linux smoke test compiles the single file and completes a one-step synthetic training save. The full AGILLM3.5 continuation run is managed separately by the disaggregated Hetzner worker setup.