Submission-ready: README, blog, training pipeline, baseline evidence, OpenEnv compliance 7a90355 Uddiii commited on 27 days ago
fix(grpo): Unsloth inference-mode swap + smaller LoRA + KV cleanup (T4 OOM #2) a3804d9 Uddiii commited on 27 days ago
fix(grpo): per-step backward to bound VRAM during update (T4 OOM fix) 8f20926 Uddiii commited on 27 days ago
feat(kaggle): add clean_launch.py + shrink budget to 20/25/30 = 75 eps cd923aa Uddiii commited on 27 days ago
feat(kaggle): default to fixed-budget curriculum 20/30/50 episodes 69f89ec Uddiii commited on 27 days ago
fix(grpo): skip reference model when kl_beta=0 to save 5GB VRAM on T4 0566783 Uddiii commited on 27 days ago
fix(kaggle): align pip-managed numpy with kernel's loaded numpy 27cf9cd Uddiii commited on 27 days ago
fix(kaggle): escape backslash-n in REPAIR cell separator print 04688c1 Uddiii commited on 27 days ago
fix(kaggle): unpin torch and loosen trl floor to prevent bnb/unsloth break 71a0a91 Uddiii commited on 28 days ago
kaggle: refresh cell-8 promotion timing for per-phase early-stop c64ec55 Uddiii commited on 28 days ago
kaggle: lower convergence bar to +1.2 reward (3.1x baseline P3) 13ae8dd Uddiii commited on 28 days ago