math_model / trainer_state.json

Commit History

Push exp8 GRPO best (step 750), gen temp=0.7 for pass@8
ab2047d
verified

jdecim commited on