Spaces:

mindchain
/

rlm-arithmetic-training

Runtime error

Upload README.md with huggingface_hub

bbf1e64 verified 27 days ago

673 Bytes

title: RLM Arithmetic Training
emoji: 🔢
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
hardware: t4-small

GRPO + RLVR Arithmetic Training

Training Qwen3-0.6B-Base on simple arithmetic (2-digit addition/subtraction) using GRPO + RLVR.

Solve arithmetic problems like:

Model: Qwen/Qwen3-0.6B-Base
Method: GRPO (Group Relative Policy Optimization) with RLVR (Reinforcement Learning with Verifiable Rewards)
Reward: Exact match on answer
Steps: 50

Base model (no math training) should perform poorly (<10%), trained model should improve significantly.