| # Code archive |
|
|
| Snapshot of /workspace on the Vast 4090 training box (instance 36391051), with git history. |
|
|
| ## Snapshot 20260521T230246Z |
|
|
| - `agillm_workspace_git_20260521T230246Z.tar.gz` - full .git directory (extract with `tar -xzf ...; mv .git /target/`) |
| - `agillm_code_snapshot_20260521T230246Z.tar.gz` - flat snapshot of all tracked files at HEAD (no .git) |
|
|
| ## Git history at time of snapshot |
|
|
| 3 commits on `main`: |
| 1. `66604cc` Baseline: vast workspace state as of 2026-05-21T23:01Z - pre-gate-fix |
| 2. `9da9902` Fix SAT gate collapse: per-position forward + correctness-based training target |
| 3. `e18cc73` Add AR-only ablation script (5k steps from 220k, no SAT loss) |
|
|
| ## SAT gate fix summary (commit 9da9902) |
|
|
| Root-cause: training loss hardcoded gate target to torch.ones(...), causing the |
| SAT gate to collapse to a deterministic one-hot. Result: sat-var and sat-fixed |
| inference produced byte-identical output (gate always picked stride=2). |
|
|
| Three sub-fixes in nB300.py: |
| - SATHead.forward: gate applied per-position (was h_last[:, 0] only) |
| - Training loss: gate target = (logits_sat.argmax(-1) == tgt_sat).long() |
| - Inference: gate is now [B, SAT_BLOCK, 2]; stride uses gate[:, 0] |
|
|
| Backward-compatible with existing checkpoints. Existing 220k checkpoint's gate |
| weights are still collapsed from prior training (needs fresh SFT for full |
| benefit), but the inference math is now correct. |
|
|