--- library_name: pytorch tags: - pytorch - transformer - language-model - long-context - agillm - dblock - single-file - experimental --- # AGILLM-4 dblock single-file This repo packages the live AGILLM-4 dblock trainer as one runnable Python file: - `agillm4_dblock_single_file.py` It was regenerated on `2026-05-31T16:07:54Z` by mechanically inlining the live VastAI training sources: - `fused_ce.py` - `anchor_memory.py` - `dblocks_train.py` - `nB300_agillm4.py` The original live command uses `nB300_agillm4.py train`. This single-file build keeps that CLI surface, registers in-memory shims for the former helper modules, and disables helper-module smoke tests that would otherwise fire because the packed file is `__main__`. See `single_file_manifest.json` for source hashes from the generated build. Example training shape: ```bash python agillm4_dblock_single_file.py train --preset agillm4_floor --dblock ... ``` This is experimental training code, not a polished inference package. ## Inference Smoke Test Validated on the live VastAI training box against `/workspace/agillm4_4090_ckpts/pretrain_step01176781.pt` using CPU-only AR inference: ```bash CUDA_VISIBLE_DEVICES= python agillm4_dblock_single_file.py infer \ --mode ar \ --ckpt /workspace/agillm4_4090_ckpts/pretrain_step01176781.pt \ --prompt "User: Say hello in one short sentence. Assistant:" \ --max_new 8 --greedy --plain-output --attn_backend manual ``` The trainer zero-fills missing SAT/NAT bias keys during inference compatibility loading, which lets older full checkpoints run without leaving newly introduced bias tensors random. ## NAT Decode Notes The packed trainer includes the same NAT inference anti-collapse changes as the live trainer. NAT now applies repetition/frequency/presence penalties and sampler controls while committing masked positions, rather than filling every blank with an unconstrained argmax. Smoke result on , CPU-only, : about 67 tok/s and no all-token collapse. Output quality is still early-training rough; this is a decoding stability improvement, not a solved NAT head.