Spaces:

Jayant2304
/

commitment-os

Sleeping

Jayant2304 commited on Apr 26

Commit

2c07089

verified ·

1 Parent(s): ca276e9

Update Blog.md

Files changed (1) hide show

Blog.md CHANGED Viewed

@@ -106,7 +106,7 @@ Rather than abstract descriptions, here's what the agent actually faces. These a
 ---
 ### Scenario 1: The Email That Breaks Everything
-*(easy_008 — medium difficulty)*
 It's 2:45 PM. You're on a live client call with Client_Jones that ends at 3:15.
@@ -265,8 +265,9 @@ The training loop connects directly to the live CommitmentOS API — not a stati
 | | Pre-RL | Post-RL |
 |--|--------|---------|
 | Success rate (reward ≥ 0.6) | 46.7% | **60.0%** |
-Gains concentrated on hard tasks — exactly where long commitment chains matter most.
 Full weights + artifacts: [Google Drive bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing)

 ---
 ### Scenario 1: The Email That Breaks Everything
+*(med_008 — medium difficulty)*
 It's 2:45 PM. You're on a live client call with Client_Jones that ends at 3:15.
 | | Pre-RL | Post-RL |
 |--|--------|---------|
 | Success rate (reward ≥ 0.6) | 46.7% | **60.0%** |
+| Hard task mean reward | 0.560 | **0.612** |
+With 30 GRPO steps on a 1.5B model, mean reward is essentially flat — expected at this compute scale. The success rate improvement is real: 2 additional tasks cross the threshold after training, with the clearest gains on hard scenarios where commitment tracking across 8–15 turns matters most. Longer training would amplify these results.
 Full weights + artifacts: [Google Drive bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing)