OpenTransformer's picture
|
download
raw
1.99 kB
# AGILLM4 Distributed Training Benchmark
This folder contains a non-destructive benchmark harness for the real AGILLM4
DiffusionBlock training path.
It exports copied block-slice leases from a full AGILLM4 checkpoint, runs each
lease on a worker using the live AGILLM4 `Block`, MoE FFN, sublinear attention
mask path, and V4-Pro tokenizer vocab, then collects timing summaries. It does
not modify the source checkpoint.
## Files
- `agillm4_export_bench_packages.py` - export shared frozen tensors and one
DBlock lease per worker from a full AGILLM4 checkpoint.
- `agillm4_slice_bench_worker.py` - run one worker lease with the live
`nB300_agillm4.py` runtime plus its companion `dblocks_train.py`,
`fused_ce.py`, and `anchor_memory.py` sidecars.
- `agillm4_collect_bench_results.py` - collect update `.pt` files into a
combined throughput summary.
## Example
```bash
python agillm4_export_bench_packages.py \
--ckpt /workspace/agillm4_4090_ckpts/pretrain_step01317993.pt \
--out-dir /workspace/agillm4_dist_bench/export_test \
--workers geth:0,mcp:1,prime:2,communist-web:3 \
--dblock-blocks 4 \
--steps 1 \
--batch-size 1 \
--block-size 128 \
--attn-backend sublinear \
--sublinear-window 128 \
--sublinear-stride 128 \
--sublinear-max-anchors 128 \
--sublinear-chunk 128
```
Each worker then runs its assigned lease:
```bash
python agillm4_slice_bench_worker.py \
--package lease_geth_block0_agillm4bench.pt \
--shared shared_frozen.pt \
--runtime /root/agillm4_worker/runtime/nB300_agillm4.py \
--out agillm4_bench_update_geth.pt \
--device cpu \
--threads 2
```
The first live all-node run on 2026-06-02 used four 7-layer block leases from
`pretrain_step01317993.pt` and completed 512 unique block tokens in 106.724 s,
or 4.797 combined tok/s, versus a 4090 monolithic baseline of 3357.167 tok/s.
This is a correctness/architecture benchmark for CPU worker slices, not a claim
that the CPU mesh is competitive with the 4090 trainer.

Xet Storage Details

Size:
1.99 kB
·
Xet hash:
4f9efce08a7ce3cbb31cce7b64bc441de38b7a02f5a93d58dd73a30c1b6fbf63

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.