OpenTransformer
/

AGILLM-3.5

@@ -1,34 +1,104 @@
-# AGILLM-3.5
-698M-param decoder LM (d=1024, 24 layers, 16 heads, rank=128, expansion 2.0×), DeepSeek-V3.2
-tokenizer, AR + SAT (speculative) heads, trained with **DiffusionBlocks** (block-wise EDM
-denoising, 8 blocks). Forked from the AGILLM-3 ~step-51081 base.
-## Checkpoints
-### `distributed/` — the live distributed model
-`master_round244.pt` — trained **block-disaggregated across 4 Hetzner nodes** (GETH +
-MCP + PRIME + COMMUNIST-WEB). GETH coordinates and trains blocks 0,1,3,5,7 locally; MCP/PRIME/
-COMMUNIST-WEB train blocks 2/4/6 over the private network. Each round exports block slices,
-trains them independently, and merges them back. **739 merged block-updates** at snapshot.
-Single full file per snapshot (each round is a block merge, not a delta).
-### `single_node/` — single-node dblock lineage (full + delta)
-- `pretrain_step00053908.pt` — full checkpoint (7.3 GB)
-- `pretrain_delta_step00053702.pt` (+ `.sha256`) — delta checkpoint (2.8 GB)
-Checkpoint dict keys: `core` (backbone), `ar`, `sat` (heads), `cfg`, embedded
-`tokenizer_json`, plus `disagg_updates` (merge provenance) on the distributed master.
-## Code
-- `agillm35.py` - single-file AGILLM3.5 runtime for training/status/inference.
-- `distributed/public_join/` - public signed-lease host and outbound worker scripts for untrusted joiners.
-- `distributed/inference/agillm35_distributed_infer.py` - phase-1 distributed AR inference harness for transformer/MoE/DiffusionBlock layer stages.
-## Inference
-Load with the AGILLM nB300 code (`infer --mode ar|sat`); the tokenizer round-trips from the
-embedded `tokenizer_json`.
-Distributed AR inference can split contiguous transformer/DiffusionBlock layer ranges across local and HTTP worker stages. `--cache-mode kv` is the default and keeps per-session KV state on workers after prompt prefill, so decode steps send only the new hidden token through the pipeline. The network payload path uses a raw tensor wire format rather than unpickling remote worker responses; use TLS and a bearer token outside localhost.

+---
+library_name: pytorch
+tags:
+  - agillm
+  - transformer
+  - diffusion-block
+  - single-file
+license: other
+---
+# AGILLM3.5 Single File
+AGILLM3.5 is the AGILLM3 checkpoint/tokenizer contract running on the AGILLM4 runtime and DiffusionBlock training path.
+The runnable artifact is `agillm35.py`. The helper modules are folded into that one file so the runtime can be cloned, inspected, and launched without restoring the whole AGILLM4 source tree.
+## Public Join Scripts
+`public_join/agillm35_network_host.py` starts a signed-lease HTTPS coordinator for people who want to run their own network.
+`public_join/agillm35_join_worker.py` is an outbound-only worker for untrusted joiners. It requests short-lived leases, verifies package hashes, runs a local worker command, and submits results to quarantine rather than exposing SSH or writing directly into the master merge path.
+## Distributed Inference
+`distributed_infer/agillm35_distributed_infer.py` is a single-file distributed AR inference harness for the real AGILLM3.5 transformer. It splits contiguous transformer/DiffusionBlock layer ranges across local or HTTP worker stages, using the actual `Block` implementation and MoE FFNs from the checkpoint config.
+Plan layer ranges:
+```bash
+python distributed_infer/agillm35_distributed_infer.py plan \
+  --agillm35-path ./agillm35.py \
+  --ckpt /path/to/master.pt \
+  --dblock-blocks 8
+```
+Start a worker for one layer range:
+```bash
+AGILLM35_INFER_TOKEN='change-me' python distributed_infer/agillm35_distributed_infer.py worker \
+  --agillm35-path ./agillm35.py \
+  --ckpt /path/to/master.pt \
+  --start-layer 0 \
+  --end-layer 12 \
+  --host 0.0.0.0 \
+  --port 9100
+```
+Run the coordinator:
+```bash
+AGILLM35_INFER_TOKEN='change-me' python distributed_infer/agillm35_distributed_infer.py infer \
+  --agillm35-path ./agillm35.py \
+  --ckpt /path/to/master.pt \
+  --prompt "Hello" \
+  --max-new 32 \
+  --cache-mode kv \
+  --stage https://worker-a.example:9100,0,12 \
+  --stage local:12:24
+```
+Network tensor payloads use a small raw tensor wire format rather than unpickling remote worker responses. Use TLS plus a bearer token for workers exposed beyond localhost. `--cache-mode kv` is the default and keeps per-session KV state on each worker after the prompt prefill, so decode steps send only the new hidden token through the pipeline. `--cache-mode full` is kept for comparison/debugging. SAT/NAT distributed decoding is a later phase.
+For inference against the live round-299 checkpoint, prefer the HF inference-slim artifact `distributed/inference/master_r299_20260602-205914_ar_infer_slim.pt`; it drops optimizer/SAT/disaggregated training state while preserving AR transformer inference.
+## Defaults
+- tokenizer: `deepseek-ai/DeepSeek-V3.2`
+- preset: `large` (`d=1024`, `layers=24`, `heads=16`, `rank=128`)
+- compatibility mode: `--agillm3_compat`
+- NAT head/objective: disabled for AGILLM3 checkpoint compatibility
+- DiffusionBlocks: available with `--dblock`
+## Commands
+```bash
+python agillm35.py --help
+python agillm35.py status --ckpt /path/to/pretrain_step00051081.pt
+python agillm35.py infer --ckpt /path/to/pretrain_step00051081.pt --prompt "Hello"
+```
+## Example
+```bash
+python agillm35.py train \
+  --agillm3_compat \
+  --preset large \
+  --resume /path/to/pretrain_step00051081.pt \
+  --block 512 \
+  --batch_size 1 \
+  --source HuggingFaceFW/fineweb-edu \
+  --save_dir ckpts \
+  --dblock \
+  --dblock_blocks 8 \
+  --nat_every 0 \
+  --dblock_nat_weight 0
+```
+## Notes
+This repository contains code only, not AGILLM3 checkpoint weights.
+DiffusionBlock logs report raw CE-style `loss` plus the actual EDM-weighted training objective as `weighted`. The weighted value is the optimization target; the raw value is the sanity-check number to compare with ordinary AR/SAT loss.
+The Linux smoke test compiles the single file and completes a one-step synthetic training save. The full AGILLM3.5 continuation run is managed separately by the disaggregated Hetzner worker setup.