Finetuned Qwen2.5-0.5B on GSM8k, using Group Relative Policy Optimization proposed on DeepSeekMath.

A random sample,

Question:
Jenna is adding black dots to a bunch of white blouses. Each blouse gets 20 dots, and each dot takes 10 ml of black dye. How many 400-ml bottles of dye does Jenna need to buy to dye 100 blouses?

Answer:
50

Response:
<reasoning>
Jenna needs to dye 100 blouses, and each blouse gets 20 dots, so she needs 100 * 20 = 2000 dots in total. Each dot takes 10 ml of black dye, so she needs 2000 * 10 = 20000 ml of black dye. Since each bottle holds 400 ml of dye, Jenna needs to buy 20000 / 400 = 50 bottles of dye.
</reasoning>
<answer>
50
</answer>
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support