Spaces:

mindchain
/

rlm-arithmetic-training

Runtime error

Upload README.md with huggingface_hub

bbf1e64 verified 28 days ago

673 Bytes

	---
	title: RLM Arithmetic Training
	emoji: 🔢
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	hardware: t4-small
	---

	# GRPO + RLVR Arithmetic Training

	Training Qwen3-0.6B-Base on simple arithmetic (2-digit addition/subtraction) using GRPO + RLVR.

	## Task

	Solve arithmetic problems like:
	- 47 + 35 = 82
	- 92 - 17 = 75

	## Approach

	- Model: Qwen/Qwen3-0.6B-Base
	- Method: GRPO (Group Relative Policy Optimization) with RLVR (Reinforcement Learning with Verifiable Rewards)
	- Reward: Exact match on answer
	- Steps: 50

	## Expected Results

	Base model (no math training) should perform poorly (<10%), trained model should improve significantly.