Spaces:
Sleeping
Sleeping
havinashpatil commited on
Commit Β·
b098526
1
Parent(s): 7f4c57d
Add all 7 results charts to README and BLOG
Browse files
BLOG.md
CHANGED
|
@@ -198,6 +198,12 @@ We trained `Qwen/Qwen2.5-Coder-1.5B` on the `m-a-p/Code-Feedback` dataset with C
|
|
| 198 |

|
| 199 |
*Fig 5: Reward Distribution by Fixer Method, comparing the performance of the Ollama LLM to the built-in pattern-based fixer.*
|
| 200 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 201 |
### Reproducing the Training
|
| 202 |
|
| 203 |
The complete training pipeline is available as a Colab notebook:
|
|
|
|
| 198 |

|
| 199 |
*Fig 5: Reward Distribution by Fixer Method, comparing the performance of the Ollama LLM to the built-in pattern-based fixer.*
|
| 200 |
|
| 201 |
+

|
| 202 |
+
*Fig 6: Cumulative Reward over time, highlighting the total accumulated reward across multiple episodes.*
|
| 203 |
+
|
| 204 |
+

|
| 205 |
+
*Fig 7: LLM Fixer Method Performance Comparison scatter plot showing the individual performance data points of Ollama vs Builtin methods.*
|
| 206 |
+
|
| 207 |
### Reproducing the Training
|
| 208 |
|
| 209 |
The complete training pipeline is available as a Colab notebook:
|
README.md
CHANGED
|
@@ -119,6 +119,12 @@ We trained `Qwen/Qwen2.5-Coder-1.5B` using **TRL GRPO** (Group Relative Policy O
|
|
| 119 |

|
| 120 |
*Fig 5: Reward Distribution by Fixer Method, comparing the performance of the Ollama LLM to the built-in pattern-based fixer.*
|
| 121 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
### Key Observations:
|
| 123 |
- **Initial performance**: Agent produces syntactically broken fixes β reward β 0.01
|
| 124 |
- **After 20 steps**: Agent learns to fix syntax β reward β 0.35
|
|
|
|
| 119 |

|
| 120 |
*Fig 5: Reward Distribution by Fixer Method, comparing the performance of the Ollama LLM to the built-in pattern-based fixer.*
|
| 121 |
|
| 122 |
+

|
| 123 |
+
*Fig 6: Cumulative Reward over time, highlighting the total accumulated reward across multiple episodes.*
|
| 124 |
+
|
| 125 |
+

|
| 126 |
+
*Fig 7: LLM Fixer Method Performance Comparison scatter plot showing the individual performance data points of Ollama vs Builtin methods.*
|
| 127 |
+
|
| 128 |
### Key Observations:
|
| 129 |
- **Initial performance**: Agent produces syntactically broken fixes β reward β 0.01
|
| 130 |
- **After 20 steps**: Agent learns to fix syntax β reward β 0.35
|