AGILLM3.5 Single File

AGILLM3.5 is the AGILLM3 checkpoint/tokenizer contract running on the AGILLM4 runtime and DiffusionBlock training path.

The runnable artifact is agillm35.py. The helper modules are folded into that one file so the runtime can be cloned, inspected, and launched without restoring the whole AGILLM4 source tree.

Defaults

  • tokenizer: deepseek-ai/DeepSeek-V3.2
  • preset: large (d=1024, layers=24, heads=16, rank=128)
  • compatibility mode: --agillm3_compat
  • NAT head/objective: disabled for AGILLM3 checkpoint compatibility
  • DiffusionBlocks: available with --dblock

Commands

python agillm35.py --help
python agillm35.py status --ckpt /path/to/pretrain_step00051081.pt
python agillm35.py infer --ckpt /path/to/pretrain_step00051081.pt --prompt "Hello"

Example

python agillm35.py train \
  --agillm3_compat \
  --preset large \
  --resume /path/to/pretrain_step00051081.pt \
  --block 512 \
  --batch_size 1 \
  --source HuggingFaceFW/fineweb-edu \
  --save_dir ckpts \
  --dblock \
  --dblock_blocks 8 \
  --nat_every 0 \
  --dblock_nat_weight 0

Notes

This repository contains code only, not AGILLM3 checkpoint weights.

DiffusionBlock logs report raw CE-style loss plus the actual EDM-weighted training objective as weighted. The weighted value is the optimization target; the raw value is the sanity-check number to compare with ordinary AR/SAT loss.

The Linux smoke test compiles the single file and completes a one-step synthetic training save. The full AGILLM3.5 continuation run is managed separately by the disaggregated Hetzner worker setup.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support