File size: 4,119 Bytes
7814939
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f8bd3e3
7814939
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---

library_name: pytorch
tags:
  - agillm
  - transformer
  - diffusion-block
  - single-file
license: other
---


# AGILLM3.5 Single File

AGILLM3.5 is the AGILLM3 checkpoint/tokenizer contract running on the AGILLM4 runtime and DiffusionBlock training path.

The runnable artifact is `agillm35.py`. The helper modules are folded into that one file so the runtime can be cloned, inspected, and launched without restoring the whole AGILLM4 source tree.

## Public Join Scripts

`public_join/agillm35_network_host.py` starts a signed-lease HTTPS coordinator for people who want to run their own network.

`public_join/agillm35_join_worker.py` is an outbound-only worker for untrusted joiners. It requests short-lived leases, verifies package hashes, runs a local worker command, and submits results to quarantine rather than exposing SSH or writing directly into the master merge path.

## Distributed Inference

`distributed_infer/agillm35_distributed_infer.py` is a single-file distributed AR inference harness for the real AGILLM3.5 transformer. It splits contiguous transformer/DiffusionBlock layer ranges across local or HTTP worker stages, using the actual `Block` implementation and MoE FFNs from the checkpoint config.

Plan layer ranges:

```bash

python distributed_infer/agillm35_distributed_infer.py plan \

  --agillm35-path ./agillm35.py \

  --ckpt /path/to/master.pt \

  --dblock-blocks 8

```

Start a worker for one layer range:

```bash

AGILLM35_INFER_TOKEN='change-me' python distributed_infer/agillm35_distributed_infer.py worker \

  --agillm35-path ./agillm35.py \

  --ckpt /path/to/master.pt \

  --start-layer 0 \

  --end-layer 12 \

  --host 0.0.0.0 \

  --port 9100

```

Run the coordinator:

```bash

AGILLM35_INFER_TOKEN='change-me' python distributed_infer/agillm35_distributed_infer.py infer \

  --agillm35-path ./agillm35.py \

  --ckpt /path/to/master.pt \

  --prompt "Hello" \

  --max-new 32 \

  --cache-mode kv \

  --stage https://worker-a.example:9100,0,12 \

  --stage local:12:24

```

Network tensor payloads use a small raw tensor wire format rather than unpickling remote worker responses. Use TLS plus a bearer token for workers exposed beyond localhost. `--cache-mode kv` is the default and keeps per-session KV state on each worker after the prompt prefill, so decode steps send only the new hidden token through the pipeline. `--cache-mode full` is kept for comparison/debugging. SAT/NAT distributed decoding is a later phase.

For inference against the live round-299 checkpoint, prefer the HF inference-slim artifact `distributed/inference/master_r299_20260602-205914_ar_infer_slim.pt`; it drops optimizer/SAT/disaggregated training state while preserving AR transformer inference.

## Defaults

- tokenizer: `deepseek-ai/DeepSeek-V3.2`
- preset: `large` (`d=1024`, `layers=24`, `heads=16`, `rank=128`)
- compatibility mode: `--agillm3_compat`
- NAT head/objective: disabled for AGILLM3 checkpoint compatibility
- DiffusionBlocks: available with `--dblock`

## Commands

```bash

python agillm35.py --help

python agillm35.py status --ckpt /path/to/pretrain_step00051081.pt

python agillm35.py infer --ckpt /path/to/pretrain_step00051081.pt --prompt "Hello"

```

## Example

```bash

python agillm35.py train \

  --agillm3_compat \

  --preset large \

  --resume /path/to/pretrain_step00051081.pt \

  --block 512 \

  --batch_size 1 \

  --source HuggingFaceFW/fineweb-edu \

  --save_dir ckpts \

  --dblock \

  --dblock_blocks 8 \

  --nat_every 0 \

  --dblock_nat_weight 0

```

## Notes

This repository contains code only, not AGILLM3 checkpoint weights.

DiffusionBlock logs report raw CE-style `loss` plus the actual EDM-weighted training objective as `weighted`. The weighted value is the optimization target; the raw value is the sanity-check number to compare with ordinary AR/SAT loss.

The Linux smoke test compiles the single file and completes a one-step synthetic training save. The full AGILLM3.5 continuation run is managed separately by the disaggregated Hetzner worker setup.