Spaces:
Sleeping
Sleeping
Update Blog.md
Browse files
Blog.md
CHANGED
|
@@ -246,19 +246,19 @@ The training loop connects directly to the live CommitmentOS API — not a stati
|
|
| 246 |
|
| 247 |
**hard_011 — Investor Dinner Cascade**
|
| 248 |
|
| 249 |
-
| |
|
| 250 |
-
|--|----------------|----------------|
|
| 251 |
| Steps taken | 1 (immediate surrender) | 6 |
|
| 252 |
| Constraints met | 0 / 6 | **6 / 6** |
|
| 253 |
| Commitments honored | 0 | **1** (happy hour renegotiated) |
|
| 254 |
| Emails sent | 0 | **2** (Team + VP_Chen) |
|
| 255 |
| Final reward | 0.50 | **0.99** |
|
| 256 |
|
| 257 |
-
**
|
| 258 |
|
| 259 |

|
| 260 |
|
| 261 |
-
*
|
| 262 |
|
| 263 |
**LLM checkpoint results (pre-RL vs post-RL Qwen2.5-1.5B):**
|
| 264 |
|
|
|
|
| 246 |
|
| 247 |
**hard_011 — Investor Dinner Cascade**
|
| 248 |
|
| 249 |
+
| | No-Action Baseline | Task-Completing Agent |
|
| 250 |
+
|--|-------------------|----------------------|
|
| 251 |
| Steps taken | 1 (immediate surrender) | 6 |
|
| 252 |
| Constraints met | 0 / 6 | **6 / 6** |
|
| 253 |
| Commitments honored | 0 | **1** (happy hour renegotiated) |
|
| 254 |
| Emails sent | 0 | **2** (Team + VP_Chen) |
|
| 255 |
| Final reward | 0.50 | **0.99** |
|
| 256 |
|
| 257 |
+
**Capability gap across all 15 tasks:**
|
| 258 |
|
| 259 |

|
| 260 |
|
| 261 |
+
*An agent that submits immediately (grey) vs one that uses the tools correctly (blue). This is the capability gap CommitmentOS trains a model to close.*
|
| 262 |
|
| 263 |
**LLM checkpoint results (pre-RL vs post-RL Qwen2.5-1.5B):**
|
| 264 |
|