OpenTransformer
/

agillm35-single-file

diffusion-block

Model card Files Files and versions

agillm35-single-file / README.md

OpenTransformer's picture

OpenTransformer

Clarify DiffusionBlock weighted loss reporting

1b8f727 verified 8 days ago

|

history blame contribute delete

1.76 kB

	---
	library_name: pytorch
	tags:
	- agillm
	- transformer
	- diffusion-block
	- single-file
	license: other
	---

	# AGILLM3.5 Single File

	AGILLM3.5 is the AGILLM3 checkpoint/tokenizer contract running on the AGILLM4 runtime and DiffusionBlock training path.

	The runnable artifact is `agillm35.py`. The helper modules are folded into that one file so the runtime can be cloned, inspected, and launched without restoring the whole AGILLM4 source tree.

	## Defaults

	- tokenizer: `deepseek-ai/DeepSeek-V3.2`
	- preset: `large` (`d=1024`, `layers=24`, `heads=16`, `rank=128`)
	- compatibility mode: `--agillm3_compat`
	- NAT head/objective: disabled for AGILLM3 checkpoint compatibility
	- DiffusionBlocks: available with `--dblock`

	## Commands

	```bash
	python agillm35.py --help
	python agillm35.py status --ckpt /path/to/pretrain_step00051081.pt
	python agillm35.py infer --ckpt /path/to/pretrain_step00051081.pt --prompt "Hello"
	```

	## Example

	```bash
	python agillm35.py train \
	--agillm3_compat \
	--preset large \
	--resume /path/to/pretrain_step00051081.pt \
	--block 512 \
	--batch_size 1 \
	--source HuggingFaceFW/fineweb-edu \
	--save_dir ckpts \
	--dblock \
	--dblock_blocks 8 \
	--nat_every 0 \
	--dblock_nat_weight 0
	```

	## Notes

	This repository contains code only, not AGILLM3 checkpoint weights.

	DiffusionBlock logs report raw CE-style `loss` plus the actual EDM-weighted training objective as `weighted`. The weighted value is the optimization target; the raw value is the sanity-check number to compare with ordinary AR/SAT loss.

	The Linux smoke test compiles the single file and completes a one-step synthetic training save. The full AGILLM3.5 continuation run is managed separately by the disaggregated Hetzner worker setup.