math_model / split_info.json

Commit History

exp6: SFT(NuminaMath) -> DPO v2, T=0.3
5184a78
verified

jdecim commited on

sft_mixlong_full: best Hard100 p@8, temperature=0.3
a421196
verified

jdecim commited on

Push DPO checkpoint with T=0.3 (optimal for pass@8 on CI gate)
ac3517e
verified

jdecim commited on

Upload exp4b: SFT→DPO v2 (L3-5, ≤5/8, beta=0.1, lr=5e-7)
32ac537
verified

jdecim commited on