fix: OOM — NUM_GENERATIONS 32→16, max_completion_length 300→200, expandable_segments c0d3d54 verified Pathikreet commited on Apr 26
fix: add 3 missing hard tasks to _TASK_DIFFICULTY (322 prompts) a6c22c8 verified Pathikreet commited on Apr 26
Fix: kl_coeff -> beta (correct TRL GRPOConfig param name) a47b370 verified Pathikreet commited on Apr 26
Auto-detect username from token for adapter + run folder upload 0f17c96 verified Pathikreet commited on Apr 26
Graceful stop: save weights on /app/stop_requested flag 65ac9f8 verified Pathikreet commited on Apr 26
Fix hard_currency_conversion task ID in TRAIN_TASKS and EVAL_TASKS e27253e verified Pathikreet commited on Apr 26
Bump seeds: medium×8, hard/long×20 (322 prompts total) abf8676 verified Pathikreet commited on Apr 26
Run 3: temp=0.7, kl=0.1, format±0.15, no curriculum, 20 tasks, G=32 0bfa536 verified Pathikreet commited on Apr 26