Jayant2304 commited on
Commit
665fae0
·
verified ·
1 Parent(s): 2c07089

Update Blog.md

Browse files
Files changed (1) hide show
  1. Blog.md +4 -4
Blog.md CHANGED
@@ -246,19 +246,19 @@ The training loop connects directly to the live CommitmentOS API — not a stati
246
 
247
  **hard_011 — Investor Dinner Cascade**
248
 
249
- | | Before Training | After Training |
250
- |--|----------------|----------------|
251
  | Steps taken | 1 (immediate surrender) | 6 |
252
  | Constraints met | 0 / 6 | **6 / 6** |
253
  | Commitments honored | 0 | **1** (happy hour renegotiated) |
254
  | Emails sent | 0 | **2** (Team + VP_Chen) |
255
  | Final reward | 0.50 | **0.99** |
256
 
257
- **Reward by task — before vs after across all 15 scenarios:**
258
 
259
  ![Baseline vs Improved Reward by Task — blue bars near 1.0, grey baseline bars ranging 0.4-0.76](reward_by_task.svg)
260
 
261
- *Every task improves. Every single one.*
262
 
263
  **LLM checkpoint results (pre-RL vs post-RL Qwen2.5-1.5B):**
264
 
 
246
 
247
  **hard_011 — Investor Dinner Cascade**
248
 
249
+ | | No-Action Baseline | Task-Completing Agent |
250
+ |--|-------------------|----------------------|
251
  | Steps taken | 1 (immediate surrender) | 6 |
252
  | Constraints met | 0 / 6 | **6 / 6** |
253
  | Commitments honored | 0 | **1** (happy hour renegotiated) |
254
  | Emails sent | 0 | **2** (Team + VP_Chen) |
255
  | Final reward | 0.50 | **0.99** |
256
 
257
+ **Capability gap across all 15 tasks:**
258
 
259
  ![Baseline vs Improved Reward by Task — blue bars near 1.0, grey baseline bars ranging 0.4-0.76](reward_by_task.svg)
260
 
261
+ *An agent that submits immediately (grey) vs one that uses the tools correctly (blue). This is the capability gap CommitmentOS trains a model to close.*
262
 
263
  **LLM checkpoint results (pre-RL vs post-RL Qwen2.5-1.5B):**
264