--- library_name: pytorch tags: - agillm - transformer - diffusion-block - single-file license: other --- # AGILLM3.5 Single File AGILLM3.5 is the AGILLM3 checkpoint/tokenizer contract running on the AGILLM4 runtime and DiffusionBlock training path. The runnable artifact is `agillm35.py`. The helper modules are folded into that one file so the runtime can be cloned, inspected, and launched without restoring the whole AGILLM4 source tree. ## Defaults - tokenizer: `deepseek-ai/DeepSeek-V3.2` - preset: `large` (`d=1024`, `layers=24`, `heads=16`, `rank=128`) - compatibility mode: `--agillm3_compat` - NAT head/objective: disabled for AGILLM3 checkpoint compatibility - DiffusionBlocks: available with `--dblock` ## Commands ```bash python agillm35.py --help python agillm35.py status --ckpt /path/to/pretrain_step00051081.pt python agillm35.py infer --ckpt /path/to/pretrain_step00051081.pt --prompt "Hello" ``` ## Example ```bash python agillm35.py train \ --agillm3_compat \ --preset large \ --resume /path/to/pretrain_step00051081.pt \ --block 512 \ --batch_size 1 \ --source HuggingFaceFW/fineweb-edu \ --save_dir ckpts \ --dblock \ --dblock_blocks 8 \ --nat_every 0 \ --dblock_nat_weight 0 ``` ## Notes This repository contains code only, not AGILLM3 checkpoint weights. DiffusionBlock logs report raw CE-style `loss` plus the actual EDM-weighted training objective as `weighted`. The weighted value is the optimization target; the raw value is the sanity-check number to compare with ordinary AR/SAT loss. The Linux smoke test compiles the single file and completes a one-step synthetic training save. The full AGILLM3.5 continuation run is managed separately by the disaggregated Hetzner worker setup.