Vighnesh commited on
Commit
e531507
ยท
1 Parent(s): d771897

Update: replace broken chart with winning GRPO results (Overall 0.29->0.57)

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -118,6 +118,19 @@ python run_tests.py
118
 
119
  > ๐ŸŽฎ **Playground UI** available at `http://localhost:7860/playground` once the server is running.
120
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  ## Baseline Scores
122
 
123
  Measured with `gpt-4o-mini`, seeds `[42, 7, 123]`:
 
118
 
119
  > ๐ŸŽฎ **Playground UI** available at `http://localhost:7860/playground` once the server is running.
120
 
121
+ ## Training Results (GRPO)
122
+
123
+ Fine-tuned `Qwen2.5-0.5B-Instruct` with GRPO via HuggingFace TRL over 700+ steps:
124
+
125
+ ![GRPO Training Results](grpo_results.png)
126
+
127
+ | Task | Before GRPO | After GRPO | Improvement |
128
+ |------|-------------|------------|-------------|
129
+ | Task 1 - Classification | 0.67 | **1.00** | +49% ๐Ÿš€ |
130
+ | Task 2 - Action Selection | 0.12 | **0.48** | +300% ๐Ÿš€ |
131
+ | Task 3 - Full Resolution | 0.08 | **0.23** | +187% ๐Ÿš€ |
132
+ | **Overall** | **0.29** | **0.57** | **+96% ๐Ÿš€** |
133
+
134
  ## Baseline Scores
135
 
136
  Measured with `gpt-4o-mini`, seeds `[42, 7, 123]`: