fix(grpo): Unsloth inference-mode swap + smaller LoRA + KV cleanup (T4 OOM #2) a3804d9 Uddiii commited on Apr 26
fix(grpo): per-step backward to bound VRAM during update (T4 OOM fix) 8f20926 Uddiii commited on Apr 26
feat(kaggle): add clean_launch.py + shrink budget to 20/25/30 = 75 eps cd923aa Uddiii commited on Apr 26