Replace SFT reward curve with baseline-anchored learning curve (tool-aware baseline → checkpoint-40 … final) 10fa2de verified SaiManish123 commited on Apr 26
Upload sft_worldsplit_1_5b/adaptshield_sft_worldsplit.summary.json with huggingface_hub 2cddefe verified SaiManish123 commited on Apr 25