sft finetuned on gsm8k dataset
used the format
<think>{thinking tokens}</think> <answer>{final answer}</answer>
got these stats on gsm8k test set after 2 epochs
correct format: 1260/1319
correct reward: 515/1319
- Downloads last month
- 8
sft finetuned on gsm8k dataset
used the format
<think>{thinking tokens}</think> <answer>{final answer}</answer>
got these stats on gsm8k test set after 2 epochs
correct format: 1260/1319
correct reward: 515/1319