mindchain commited on
Commit
bbf1e64
Β·
verified Β·
1 Parent(s): 560390f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +25 -5
README.md CHANGED
@@ -1,10 +1,30 @@
1
  ---
2
- title: Rlm Arithmetic Training
3
- emoji: πŸš€
4
- colorFrom: red
5
- colorTo: green
6
  sdk: docker
7
  pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: RLM Arithmetic Training
3
+ emoji: πŸ”’
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
  pinned: false
8
+ hardware: t4-small
9
  ---
10
 
11
+ # GRPO + RLVR Arithmetic Training
12
+
13
+ Training Qwen3-0.6B-Base on simple arithmetic (2-digit addition/subtraction) using GRPO + RLVR.
14
+
15
+ ## Task
16
+
17
+ Solve arithmetic problems like:
18
+ - 47 + 35 = 82
19
+ - 92 - 17 = 75
20
+
21
+ ## Approach
22
+
23
+ - **Model:** Qwen/Qwen3-0.6B-Base
24
+ - **Method:** GRPO (Group Relative Policy Optimization) with RLVR (Reinforcement Learning with Verifiable Rewards)
25
+ - **Reward:** Exact match on answer
26
+ - **Steps:** 50
27
+
28
+ ## Expected Results
29
+
30
+ Base model (no math training) should perform poorly (<10%), trained model should improve significantly.