---
library_name: pytorch
tags:
  - agillm
  - transformer
  - diffusion-block
  - single-file
license: other
---

# AGILLM3.5 Single File

AGILLM3.5 is the AGILLM3 checkpoint/tokenizer contract running on the AGILLM4 runtime and DiffusionBlock training path.

The runnable artifact is `agillm35.py`. The helper modules are folded into that one file so the runtime can be cloned, inspected, and launched without restoring the whole AGILLM4 source tree.

## Defaults

- tokenizer: `deepseek-ai/DeepSeek-V3.2`
- preset: `large` (`d=1024`, `layers=24`, `heads=16`, `rank=128`)
- compatibility mode: `--agillm3_compat`
- NAT head/objective: disabled for AGILLM3 checkpoint compatibility
- DiffusionBlocks: available with `--dblock`

## Commands

```bash
python agillm35.py --help
python agillm35.py status --ckpt /path/to/pretrain_step00051081.pt
python agillm35.py infer --ckpt /path/to/pretrain_step00051081.pt --prompt "Hello"
```

## Example

```bash
python agillm35.py train \
  --agillm3_compat \
  --preset large \
  --resume /path/to/pretrain_step00051081.pt \
  --block 512 \
  --batch_size 1 \
  --source HuggingFaceFW/fineweb-edu \
  --save_dir ckpts \
  --dblock \
  --dblock_blocks 8 \
  --nat_every 0 \
  --dblock_nat_weight 0
```

## Notes

This repository contains code only, not AGILLM3 checkpoint weights.

DiffusionBlock logs report raw CE-style `loss` plus the actual EDM-weighted training objective as `weighted`. The weighted value is the optimization target; the raw value is the sanity-check number to compare with ordinary AR/SAT loss.

The Linux smoke test compiles the single file and completes a one-step synthetic training save. The full AGILLM3.5 continuation run is managed separately by the disaggregated Hetzner worker setup.