Code archive
Snapshot of /workspace on the Vast 4090 training box (instance 36391051), with git history.
Snapshot 20260521T230246Z
agillm_workspace_git_20260521T230246Z.tar.gz- full .git directory (extract withtar -xzf ...; mv .git /target/)agillm_code_snapshot_20260521T230246Z.tar.gz- flat snapshot of all tracked files at HEAD (no .git)
Git history at time of snapshot
3 commits on main:
66604ccBaseline: vast workspace state as of 2026-05-21T23:01Z - pre-gate-fix9da9902Fix SAT gate collapse: per-position forward + correctness-based training targete18cc73Add AR-only ablation script (5k steps from 220k, no SAT loss)
SAT gate fix summary (commit 9da9902)
Root-cause: training loss hardcoded gate target to torch.ones(...), causing the SAT gate to collapse to a deterministic one-hot. Result: sat-var and sat-fixed inference produced byte-identical output (gate always picked stride=2).
Three sub-fixes in nB300.py:
- SATHead.forward: gate applied per-position (was h_last[:, 0] only)
- Training loss: gate target = (logits_sat.argmax(-1) == tgt_sat).long()
- Inference: gate is now [B, SAT_BLOCK, 2]; stride uses gate[:, 0]
Backward-compatible with existing checkpoints. Existing 220k checkpoint's gate weights are still collapsed from prior training (needs fresh SFT for full benefit), but the inference math is now correct.