Yonghong commited on
Commit
3d6a5d7
·
1 Parent(s): 5f8e09c

Add GRPO training curve to blog

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -259,6 +259,20 @@ We evaluated two LLM backends via the agentic loop described above: LLM decides
259
 
260
  ![Benchmark Results](benchmark_results.png)
261
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262
  Key findings:
263
  - **LLM agent outperforms rule-based baseline on 8/8 tasks** — the LLM generates better structured logs (Observability +2-3 pts) and makes smarter pagination decisions
264
  - **T1/T2/T3/T7 hit near-perfect 98.7** — the LLM correctly handles pagination, dedup, and totals filtering
 
259
 
260
  ![Benchmark Results](benchmark_results.png)
261
 
262
+ ### GRPO Rollout Training Curve (8 iterations, Moonshot V1-8K)
263
+
264
+ We ran 8 iterations of GRPO-style rollouts with group_size=2, sampling 2 random tasks per iteration. Each rollout is a full agentic episode with real LLM tool-calling decisions.
265
+
266
+ ![Training Curve](training_curve.png)
267
+
268
+ The left chart shows reward across iterations with min-max range and rolling average. The right chart shows per-task mean reward across all iterations where that task appeared. The orange dotted line marks the rule-based baseline (0.930).
269
+
270
+ Key observations:
271
+ - **Mean reward consistently above baseline** (0.930) in 6/8 iterations
272
+ - **Iterations with fault tasks (T4/T5) pull the mean down** — these are genuinely harder and require the agent to handle 429/500 errors gracefully
273
+ - **T8 mixed faults achieves 0.973** — demonstrating the LLM can handle combined rate-limit + dedup challenges
274
+ - **Per-task variance is low** (small error bars) — the agent's behavior is consistent across rollouts
275
+
276
  Key findings:
277
  - **LLM agent outperforms rule-based baseline on 8/8 tasks** — the LLM generates better structured logs (Observability +2-3 pts) and makes smarter pagination decisions
278
  - **T1/T2/T3/T7 hit near-perfect 98.7** — the LLM correctly handles pagination, dedup, and totals filtering