Buckets:
AGILLM-4.3 Independent DiffusionBlocks
AGILLM-4.3 has two different DiffusionBlocks-related paths:
- Live trainer path:
agillm41.py train --dblock ...runs one selected block objective per step inside the normal monolithic trainer. It already combines with--grad_checkpoint/--dblock_checkpoint_stride, so it mainly reduces activation pressure for the selected block but the process still owns the full model, optimizer, and checkpoint loop. - Independent path:
agillm43_diffusionblocks_independent.pytrains one block as a standalone resident model with a frozen shared stem. This is the paper-style memory trade-off: each worker only needs its own block params, grads, optimizer state, and activations, while all blocks can run independently and later compose back into a normal AGILLM checkpoint.
Do not replace a healthy live trainer with the independent path without a quality gate. The independent path is mainline-ready as an offline/side-training and low-VRAM worker path first, because the shared stem is frozen during independent block training.
Real-checkpoint workflow
Use the current full checkpoint from latest.json:
CKPT=$(python3 - <<'PY'
import json
print(json.load(open('/workspace/agillm4_4090_ckpts_active/latest.json'))['path'])
PY
)
Check the memory model against the live config:
python3 /workspace/agillm41-mainline/agillm43_diffusionblocks_independent.py \
mem-report --ckpt "$CKPT"
Create a frozen shared stem from a full checkpoint:
python3 /workspace/agillm41-mainline/agillm43_diffusionblocks_independent.py \
stem-from-ckpt --ckpt "$CKPT" --out /workspace/dbi_stem.pt
Train one initialized block from a full checkpoint:
python3 /workspace/agillm41-mainline/agillm43_diffusionblocks_independent.py \
train-block --init-ckpt "$CKPT" --B 4 --block 0 \
--steps 200 --batch 4 --seqlen 64 --device cpu \
--out /workspace/dbi_block0.pt
Run blocks 0..B-1 independently on separate machines, then compose them back into a normal AGILLM checkpoint schema:
python3 /workspace/agillm41-mainline/agillm43_diffusionblocks_independent.py \
compose-into-ckpt --base-ckpt "$CKPT" \
--blocks /workspace/dbi_block0.pt /workspace/dbi_block1.pt /workspace/dbi_block2.pt /workspace/dbi_block3.pt \
--out /workspace/dbi_composed_full.pt
The composed checkpoint contains standard core/ar fields plus a diffusionblocks_independent metadata record.
Verified on 2026-06-17
python3 -m py_compile agillm43_diffusionblocks_independent.py- CLI exposes
stem-from-ckpt,train-block --init-ckpt, andcompose-into-ckpt mem-report --ckpt /workspace/agillm4_4090_ckpts_active/pretrain_step02294806.ptread the live 4.3 cfg/vocab and reported B=4 one-block resident memory of about 4.89GB versus 19.55GB monolithic 4P memory before activations- Tiny synthetic checkpoint smoke: exported four initialized blocks and
compose-into-ckptrestored the checkpoint withmax_core_diff == 0.0 - Existing selftest still passes with ~4x optimizer-state reduction at B=4 and exact compose round-trip
Xet Storage Details
- Size:
- 3.12 kB
- Xet hash:
- 09279c2ccd9a4cd08a58050c43e898b81a38f31adfb0b1ee291a9833c3026c5c
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.