Spaces:
Sleeping
Sleeping
Vighnesh commited on
Commit ยท
e531507
1
Parent(s): d771897
Update: replace broken chart with winning GRPO results (Overall 0.29->0.57)
Browse files
README.md
CHANGED
|
@@ -118,6 +118,19 @@ python run_tests.py
|
|
| 118 |
|
| 119 |
> ๐ฎ **Playground UI** available at `http://localhost:7860/playground` once the server is running.
|
| 120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
## Baseline Scores
|
| 122 |
|
| 123 |
Measured with `gpt-4o-mini`, seeds `[42, 7, 123]`:
|
|
|
|
| 118 |
|
| 119 |
> ๐ฎ **Playground UI** available at `http://localhost:7860/playground` once the server is running.
|
| 120 |
|
| 121 |
+
## Training Results (GRPO)
|
| 122 |
+
|
| 123 |
+
Fine-tuned `Qwen2.5-0.5B-Instruct` with GRPO via HuggingFace TRL over 700+ steps:
|
| 124 |
+
|
| 125 |
+

|
| 126 |
+
|
| 127 |
+
| Task | Before GRPO | After GRPO | Improvement |
|
| 128 |
+
|------|-------------|------------|-------------|
|
| 129 |
+
| Task 1 - Classification | 0.67 | **1.00** | +49% ๐ |
|
| 130 |
+
| Task 2 - Action Selection | 0.12 | **0.48** | +300% ๐ |
|
| 131 |
+
| Task 3 - Full Resolution | 0.08 | **0.23** | +187% ๐ |
|
| 132 |
+
| **Overall** | **0.29** | **0.57** | **+96% ๐** |
|
| 133 |
+
|
| 134 |
## Baseline Scores
|
| 135 |
|
| 136 |
Measured with `gpt-4o-mini`, seeds `[42, 7, 123]`:
|