PRO-STEP-Policy-7B / tokenizer.json

Commit History

Add PRO-STEP main policy: Qwen2.5-7B-Instruct + DPO + outcome filter + α=0.3
0ee46b8
verified

DORAEMONG commited on