Buckets:
| # AGILLM-4.3 Independent DiffusionBlocks | |
| AGILLM-4.3 has two different DiffusionBlocks-related paths: | |
| - Live trainer path: `agillm41.py train --dblock ...` runs one selected block objective per step inside the normal monolithic trainer. It already combines with `--grad_checkpoint` / `--dblock_checkpoint_stride`, so it mainly reduces activation pressure for the selected block but the process still owns the full model, optimizer, and checkpoint loop. | |
| - Independent path: `agillm43_diffusionblocks_independent.py` trains one block as a standalone resident model with a frozen shared stem. This is the paper-style memory trade-off: each worker only needs its own block params, grads, optimizer state, and activations, while all blocks can run independently and later compose back into a normal AGILLM checkpoint. | |
| Do not replace a healthy live trainer with the independent path without a quality gate. The independent path is mainline-ready as an offline/side-training and low-VRAM worker path first, because the shared stem is frozen during independent block training. | |
| ## Real-checkpoint workflow | |
| Use the current full checkpoint from `latest.json`: | |
| ```bash | |
| CKPT=$(python3 - <<'PY' | |
| import json | |
| print(json.load(open('/workspace/agillm4_4090_ckpts_active/latest.json'))['path']) | |
| PY | |
| ) | |
| ``` | |
| Check the memory model against the live config: | |
| ```bash | |
| python3 /workspace/agillm41-mainline/agillm43_diffusionblocks_independent.py \ | |
| mem-report --ckpt "$CKPT" | |
| ``` | |
| Create a frozen shared stem from a full checkpoint: | |
| ```bash | |
| python3 /workspace/agillm41-mainline/agillm43_diffusionblocks_independent.py \ | |
| stem-from-ckpt --ckpt "$CKPT" --out /workspace/dbi_stem.pt | |
| ``` | |
| Train one initialized block from a full checkpoint: | |
| ```bash | |
| python3 /workspace/agillm41-mainline/agillm43_diffusionblocks_independent.py \ | |
| train-block --init-ckpt "$CKPT" --B 4 --block 0 \ | |
| --steps 200 --batch 4 --seqlen 64 --device cpu \ | |
| --out /workspace/dbi_block0.pt | |
| ``` | |
| Run blocks `0..B-1` independently on separate machines, then compose them back into a normal AGILLM checkpoint schema: | |
| ```bash | |
| python3 /workspace/agillm41-mainline/agillm43_diffusionblocks_independent.py \ | |
| compose-into-ckpt --base-ckpt "$CKPT" \ | |
| --blocks /workspace/dbi_block0.pt /workspace/dbi_block1.pt /workspace/dbi_block2.pt /workspace/dbi_block3.pt \ | |
| --out /workspace/dbi_composed_full.pt | |
| ``` | |
| The composed checkpoint contains standard `core`/`ar` fields plus a `diffusionblocks_independent` metadata record. | |
| ## Verified on 2026-06-17 | |
| - `python3 -m py_compile agillm43_diffusionblocks_independent.py` | |
| - CLI exposes `stem-from-ckpt`, `train-block --init-ckpt`, and `compose-into-ckpt` | |
| - `mem-report --ckpt /workspace/agillm4_4090_ckpts_active/pretrain_step02294806.pt` read the live 4.3 cfg/vocab and reported B=4 one-block resident memory of about 4.89GB versus 19.55GB monolithic 4P memory before activations | |
| - Tiny synthetic checkpoint smoke: exported four initialized blocks and `compose-into-ckpt` restored the checkpoint with `max_core_diff == 0.0` | |
| - Existing selftest still passes with ~4x optimizer-state reduction at B=4 and exact compose round-trip | |
Xet Storage Details
- Size:
- 3.12 kB
- Xet hash:
- 09279c2ccd9a4cd08a58050c43e898b81a38f31adfb0b1ee291a9833c3026c5c
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.