SaiManish123 commited on
Commit
10fa2de
·
verified ·
1 Parent(s): 51ebc24

Replace SFT reward curve with baseline-anchored learning curve (tool-aware baseline → checkpoint-40 … final)

Browse files
sft_worldsplit_1_5b/reward_curve.png CHANGED