havinashpatil commited on
Commit
b098526
Β·
1 Parent(s): 7f4c57d

Add all 7 results charts to README and BLOG

Browse files
Files changed (2) hide show
  1. BLOG.md +6 -0
  2. README.md +6 -0
BLOG.md CHANGED
@@ -198,6 +198,12 @@ We trained `Qwen/Qwen2.5-Coder-1.5B` on the `m-a-p/Code-Feedback` dataset with C
198
  ![Fig 5: Fixer Method Boxplot](results/method_boxplot.png)
199
  *Fig 5: Reward Distribution by Fixer Method, comparing the performance of the Ollama LLM to the built-in pattern-based fixer.*
200
 
 
 
 
 
 
 
201
  ### Reproducing the Training
202
 
203
  The complete training pipeline is available as a Colab notebook:
 
198
  ![Fig 5: Fixer Method Boxplot](results/method_boxplot.png)
199
  *Fig 5: Reward Distribution by Fixer Method, comparing the performance of the Ollama LLM to the built-in pattern-based fixer.*
200
 
201
+ ![Fig 6: Cumulative Reward](results/cumulative_reward.png)
202
+ *Fig 6: Cumulative Reward over time, highlighting the total accumulated reward across multiple episodes.*
203
+
204
+ ![Fig 7: Method Performance Comparison](results/method_performance.png)
205
+ *Fig 7: LLM Fixer Method Performance Comparison scatter plot showing the individual performance data points of Ollama vs Builtin methods.*
206
+
207
  ### Reproducing the Training
208
 
209
  The complete training pipeline is available as a Colab notebook:
README.md CHANGED
@@ -119,6 +119,12 @@ We trained `Qwen/Qwen2.5-Coder-1.5B` using **TRL GRPO** (Group Relative Policy O
119
  ![Fig 5: Fixer Method Boxplot](results/method_boxplot.png)
120
  *Fig 5: Reward Distribution by Fixer Method, comparing the performance of the Ollama LLM to the built-in pattern-based fixer.*
121
 
 
 
 
 
 
 
122
  ### Key Observations:
123
  - **Initial performance**: Agent produces syntactically broken fixes β†’ reward β‰ˆ 0.01
124
  - **After 20 steps**: Agent learns to fix syntax β†’ reward β‰ˆ 0.35
 
119
  ![Fig 5: Fixer Method Boxplot](results/method_boxplot.png)
120
  *Fig 5: Reward Distribution by Fixer Method, comparing the performance of the Ollama LLM to the built-in pattern-based fixer.*
121
 
122
+ ![Fig 6: Cumulative Reward](results/cumulative_reward.png)
123
+ *Fig 6: Cumulative Reward over time, highlighting the total accumulated reward across multiple episodes.*
124
+
125
+ ![Fig 7: Method Performance Comparison](results/method_performance.png)
126
+ *Fig 7: LLM Fixer Method Performance Comparison scatter plot showing the individual performance data points of Ollama vs Builtin methods.*
127
+
128
  ### Key Observations:
129
  - **Initial performance**: Agent produces syntactically broken fixes β†’ reward β‰ˆ 0.01
130
  - **After 20 steps**: Agent learns to fix syntax β†’ reward β‰ˆ 0.35