Spaces:

mindchain
/

rlm-arithmetic-training

Runtime error

mindchain commited on Feb 17

Commit

bbf1e64

verified ·

1 Parent(s): 560390f

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,10 +1,30 @@
 ---
-title: Rlm Arithmetic Training
-emoji: 🚀
-colorFrom: red
-colorTo: green
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: RLM Arithmetic Training
+emoji: 🔢
+colorFrom: blue
+colorTo: purple
 sdk: docker
 pinned: false
+hardware: t4-small
 ---
+# GRPO + RLVR Arithmetic Training
+Training Qwen3-0.6B-Base on simple arithmetic (2-digit addition/subtraction) using GRPO + RLVR.
+## Task
+Solve arithmetic problems like:
+- 47 + 35 = 82
+- 92 - 17 = 75
+## Approach
+- **Model:** Qwen/Qwen3-0.6B-Base
+- **Method:** GRPO (Group Relative Policy Optimization) with RLVR (Reinforcement Learning with Verifiable Rewards)
+- **Reward:** Exact match on answer
+- **Steps:** 50
+## Expected Results
+Base model (no math training) should perform poorly (<10%), trained model should improve significantly.