Spaces:
Runtime error
Runtime error
File size: 673 Bytes
a80cc87 bbf1e64 a80cc87 bbf1e64 a80cc87 bbf1e64 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ---
title: RLM Arithmetic Training
emoji: π’
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
hardware: t4-small
---
# GRPO + RLVR Arithmetic Training
Training Qwen3-0.6B-Base on simple arithmetic (2-digit addition/subtraction) using GRPO + RLVR.
## Task
Solve arithmetic problems like:
- 47 + 35 = 82
- 92 - 17 = 75
## Approach
- **Model:** Qwen/Qwen3-0.6B-Base
- **Method:** GRPO (Group Relative Policy Optimization) with RLVR (Reinforcement Learning with Verifiable Rewards)
- **Reward:** Exact match on answer
- **Steps:** 50
## Expected Results
Base model (no math training) should perform poorly (<10%), trained model should improve significantly.
|