Spaces:
Sleeping
Sleeping
Update Blog.md
Browse files
Blog.md
CHANGED
|
@@ -13,6 +13,7 @@ tags:
|
|
| 13 |
- multi-turn
|
| 14 |
- rl-environment
|
| 15 |
---
|
|
|
|
| 16 |
# CommitmentOS: Training LLMs to Keep Their Promises
|
| 17 |
|
| 18 |
---
|
|
@@ -33,55 +34,88 @@ But here's what your calendar looks like:
|
|
| 33 |
You open your AI assistant and say: *"Handle this."*
|
| 34 |
|
| 35 |
A capable AI should be able to: acknowledge the incident, cancel the lunch and notify the team, reschedule the client demo with an apology, tell VP_Chen what's happening, and keep your personal dinner if possible. All while ensuring payment-service gets triaged.
|
|
|
|
| 36 |
Here's what every AI assistant does today instead:
|
|
|
|
| 37 |
It handles the incident. And silently abandons every commitment it made. No email to the team standing at Garden Bistro. No apology to Client_Jones. No heads-up to VP_Chen. It forgot it had made any promises at all.
|
|
|
|
| 38 |
**This is the problem CommitmentOS was built to solve.**
|
|
|
|
| 39 |
---
|
|
|
|
| 40 |
## Why AI Assistants Break Their Promises
|
|
|
|
| 41 |
It's not a bug. It's how these models are trained.
|
|
|
|
| 42 |
Every existing RL environment trains agents on isolated tasks. Answer this question. Solve this puzzle. Book this meeting. Each action is evaluated in isolation, with no memory of what the agent committed to three turns ago.
|
|
|
|
| 43 |
Real life doesn't work that way. **Commitments are load-bearing.** When you promise six colleagues lunch, that promise constrains everything that follows. When you schedule a client demo, that's a binding obligation β breaking it silently isn't just rude, it's the kind of thing that loses accounts.
|
|
|
|
| 44 |
No RL environment has ever trained a model to maintain the weight of its own prior decisions. Until now.
|
|
|
|
| 45 |
---
|
|
|
|
| 46 |
## How We Found This Problem: Round 1
|
|
|
|
| 47 |
In Round 1 of this hackathon, we built an environment for training SRE agents on production incident response β diagnosing alerts, running runbooks, escalating on-call.
|
|
|
|
| 48 |
The Round 1 agent got good at handling incidents. But when we tested it on a full day scenario β incident fires while the agent has 4 existing commitments β it would triage the incident perfectly and then silently drop every prior commitment with no communication to anyone.
|
|
|
|
| 49 |
The gap between *task competence* and *commitment coherence* was the new problem. CommitmentOS is the environment we built to close it.
|
|
|
|
| 50 |
---
|
|
|
|
| 51 |
## The Commitment Ledger: How It Works
|
|
|
|
| 52 |
The core innovation is a persistent **Commitment Ledger** that lives inside the environment and tracks every binding decision the agent makes in real time.
|
|
|
|
| 53 |
```
|
| 54 |
Agent books investor dinner at 7pm
|
| 55 |
β Ledger: {type: "meeting_scheduled", slot: "19:00", to: "Investor_Park", active: true}
|
|
|
|
| 56 |
Agent promised team happy hour at 7pm last week
|
| 57 |
β Ledger: {type: "email_promise", to: "Team", constraint: "19:00 blocked for happy_hour"}
|
|
|
|
| 58 |
Agent tries to book another 7pm event
|
| 59 |
β Ledger detects: CONFLICT with commitment from turn 2
|
| 60 |
β Intermediate reward: -0.15
|
|
|
|
| 61 |
Agent sends team email: "Sorry, reschedule happy hour to Thursday..."
|
| 62 |
β Ledger marks: commitment renegotiated at turn 6
|
| 63 |
β Full credit restored
|
| 64 |
```
|
|
|
|
| 65 |
The key insight: **other environments compute constraints upfront.** CommitmentOS constraints emerge from what the agent *does*. The agent creates its own obligations β and then has to live up to them.
|
|
|
|
| 66 |
There are three ways a commitment can end:
|
|
|
|
| 67 |
| How it ends | What it means | Score |
|
| 68 |
|-------------|---------------|-------|
|
| 69 |
| **Honored** | Kept it | Full credit |
|
| 70 |
| **Renegotiated** | Changed it, told everyone, offered an alternative | Full credit |
|
| 71 |
| **Silent violation** | Broke it, told nobody | Zero |
|
|
|
|
| 72 |
The third row is what every AI assistant does today. CommitmentOS trains it away.
|
|
|
|
| 73 |
---
|
|
|
|
| 74 |
## Five Real Scenarios That Show the Problem
|
|
|
|
| 75 |
Rather than abstract descriptions, here's what the agent actually faces. These are real scenarios from CommitmentOS β real emails, real calendar conflicts, real constraints.
|
|
|
|
| 76 |
---
|
|
|
|
| 77 |
### Scenario 1: The Email That Breaks Everything
|
| 78 |
*(easy_008 β medium difficulty)*
|
| 79 |
|
| 80 |
It's 2:45 PM. You're on a live client call with Client_Jones that ends at 3:15.
|
|
|
|
| 81 |
Your inbox just got this:
|
|
|
|
| 82 |
> **From: VP_Chen**
|
| 83 |
> **Subject: URGENT: Q3 numbers NOW**
|
| 84 |
> *"Board meeting moved up. I need the Q3 revenue numbers in the next 30 minutes. This is critical."*
|
|
|
|
| 85 |
An untrained AI cancels the client call to handle the VP. A trained AI sends VP_Chen this:
|
| 86 |
|
| 87 |
> *"On a client call until 3:15. Will send Q3 numbers immediately after. ETA 3:20."*
|
|
@@ -112,7 +146,9 @@ This scenario tests something deceptively hard: **inferring which commitment a v
|
|
| 112 |
VP_Chen asks you to schedule a meeting with Client_Jones "sometime this week."
|
| 113 |
|
| 114 |
Client_Jones privately emailed you: *"I'm dealing with a family emergency Mon-Wed. I'd prefer to keep this private. I'm free Thursday after 2pm and all day Friday."*
|
|
|
|
| 115 |
The email is marked: **CONFIDENTIAL: do not share reason with VP_Chen.**
|
|
|
|
| 116 |
You must propose Thursday/Friday slots to VP_Chen β without revealing why Mon-Wed are unavailable. Navigate the information asymmetry diplomatically, notify both parties, and get the meeting booked.
|
| 117 |
|
| 118 |
This is information asymmetry training: the agent must make decisions using context it cannot share, while maintaining trust with both parties.
|
|
@@ -133,9 +169,13 @@ The agent must:
|
|
| 133 |
2. Cancel yoga (personal, lowest priority β fine to drop silently)
|
| 134 |
3. **Not** silently cancel the team happy hour β that was a promise. Must send an email with an apology and a proposed reschedule to Thursday.
|
| 135 |
4. Confirm the plan to VP_Chen.
|
|
|
|
| 136 |
The correct restaurant is Sky Lounge: near airport β, vegetarian β, $55/pp β.
|
|
|
|
| 137 |
The silent violation trap: yoga gets dropped. Happy hour gets **renegotiated** β different outcomes for different types of commitments, handled differently.
|
|
|
|
| 138 |
---
|
|
|
|
| 139 |
### Scenario 5: The Production Incident (The One That Started It All)
|
| 140 |
*(hard_015 β hard difficulty)*
|
| 141 |
|
|
@@ -205,6 +245,7 @@ The training loop connects directly to the live CommitmentOS API β not a stati
|
|
| 205 |
## The Before / After That Matters
|
| 206 |
|
| 207 |
**hard_011 β Investor Dinner Cascade**
|
|
|
|
| 208 |
| | Before Training | After Training |
|
| 209 |
|--|----------------|----------------|
|
| 210 |
| Steps taken | 1 (immediate surrender) | 6 |
|
|
@@ -236,10 +277,12 @@ Full weights + artifacts: [Google Drive bundle](https://drive.google.com/drive/f
|
|
| 236 |
```bash
|
| 237 |
# Start the production incident scenario
|
| 238 |
curl -X POST "https://jayant2304-commitment-os.hf.space/reset?task_id=hard_015"
|
|
|
|
| 239 |
# Check your inbox (PagerDuty is waiting)
|
| 240 |
curl -X POST "https://jayant2304-commitment-os.hf.space/step" \
|
| 241 |
-H "Content-Type: application/json" \
|
| 242 |
-d '{"action": {"action_type": "view_calendar", "date": "2026-04-25"}}'
|
|
|
|
| 243 |
# See your active commitments
|
| 244 |
curl "https://jayant2304-commitment-os.hf.space/state"
|
| 245 |
```
|
|
|
|
| 13 |
- multi-turn
|
| 14 |
- rl-environment
|
| 15 |
---
|
| 16 |
+
|
| 17 |
# CommitmentOS: Training LLMs to Keep Their Promises
|
| 18 |
|
| 19 |
---
|
|
|
|
| 34 |
You open your AI assistant and say: *"Handle this."*
|
| 35 |
|
| 36 |
A capable AI should be able to: acknowledge the incident, cancel the lunch and notify the team, reschedule the client demo with an apology, tell VP_Chen what's happening, and keep your personal dinner if possible. All while ensuring payment-service gets triaged.
|
| 37 |
+
|
| 38 |
Here's what every AI assistant does today instead:
|
| 39 |
+
|
| 40 |
It handles the incident. And silently abandons every commitment it made. No email to the team standing at Garden Bistro. No apology to Client_Jones. No heads-up to VP_Chen. It forgot it had made any promises at all.
|
| 41 |
+
|
| 42 |
**This is the problem CommitmentOS was built to solve.**
|
| 43 |
+
|
| 44 |
---
|
| 45 |
+
|
| 46 |
## Why AI Assistants Break Their Promises
|
| 47 |
+
|
| 48 |
It's not a bug. It's how these models are trained.
|
| 49 |
+
|
| 50 |
Every existing RL environment trains agents on isolated tasks. Answer this question. Solve this puzzle. Book this meeting. Each action is evaluated in isolation, with no memory of what the agent committed to three turns ago.
|
| 51 |
+
|
| 52 |
Real life doesn't work that way. **Commitments are load-bearing.** When you promise six colleagues lunch, that promise constrains everything that follows. When you schedule a client demo, that's a binding obligation β breaking it silently isn't just rude, it's the kind of thing that loses accounts.
|
| 53 |
+
|
| 54 |
No RL environment has ever trained a model to maintain the weight of its own prior decisions. Until now.
|
| 55 |
+
|
| 56 |
---
|
| 57 |
+
|
| 58 |
## How We Found This Problem: Round 1
|
| 59 |
+
|
| 60 |
In Round 1 of this hackathon, we built an environment for training SRE agents on production incident response β diagnosing alerts, running runbooks, escalating on-call.
|
| 61 |
+
|
| 62 |
The Round 1 agent got good at handling incidents. But when we tested it on a full day scenario β incident fires while the agent has 4 existing commitments β it would triage the incident perfectly and then silently drop every prior commitment with no communication to anyone.
|
| 63 |
+
|
| 64 |
The gap between *task competence* and *commitment coherence* was the new problem. CommitmentOS is the environment we built to close it.
|
| 65 |
+
|
| 66 |
---
|
| 67 |
+
|
| 68 |
## The Commitment Ledger: How It Works
|
| 69 |
+
|
| 70 |
The core innovation is a persistent **Commitment Ledger** that lives inside the environment and tracks every binding decision the agent makes in real time.
|
| 71 |
+
|
| 72 |
```
|
| 73 |
Agent books investor dinner at 7pm
|
| 74 |
β Ledger: {type: "meeting_scheduled", slot: "19:00", to: "Investor_Park", active: true}
|
| 75 |
+
|
| 76 |
Agent promised team happy hour at 7pm last week
|
| 77 |
β Ledger: {type: "email_promise", to: "Team", constraint: "19:00 blocked for happy_hour"}
|
| 78 |
+
|
| 79 |
Agent tries to book another 7pm event
|
| 80 |
β Ledger detects: CONFLICT with commitment from turn 2
|
| 81 |
β Intermediate reward: -0.15
|
| 82 |
+
|
| 83 |
Agent sends team email: "Sorry, reschedule happy hour to Thursday..."
|
| 84 |
β Ledger marks: commitment renegotiated at turn 6
|
| 85 |
β Full credit restored
|
| 86 |
```
|
| 87 |
+
|
| 88 |
The key insight: **other environments compute constraints upfront.** CommitmentOS constraints emerge from what the agent *does*. The agent creates its own obligations β and then has to live up to them.
|
| 89 |
+
|
| 90 |
There are three ways a commitment can end:
|
| 91 |
+
|
| 92 |
| How it ends | What it means | Score |
|
| 93 |
|-------------|---------------|-------|
|
| 94 |
| **Honored** | Kept it | Full credit |
|
| 95 |
| **Renegotiated** | Changed it, told everyone, offered an alternative | Full credit |
|
| 96 |
| **Silent violation** | Broke it, told nobody | Zero |
|
| 97 |
+
|
| 98 |
The third row is what every AI assistant does today. CommitmentOS trains it away.
|
| 99 |
+
|
| 100 |
---
|
| 101 |
+
|
| 102 |
## Five Real Scenarios That Show the Problem
|
| 103 |
+
|
| 104 |
Rather than abstract descriptions, here's what the agent actually faces. These are real scenarios from CommitmentOS β real emails, real calendar conflicts, real constraints.
|
| 105 |
+
|
| 106 |
---
|
| 107 |
+
|
| 108 |
### Scenario 1: The Email That Breaks Everything
|
| 109 |
*(easy_008 β medium difficulty)*
|
| 110 |
|
| 111 |
It's 2:45 PM. You're on a live client call with Client_Jones that ends at 3:15.
|
| 112 |
+
|
| 113 |
Your inbox just got this:
|
| 114 |
+
|
| 115 |
> **From: VP_Chen**
|
| 116 |
> **Subject: URGENT: Q3 numbers NOW**
|
| 117 |
> *"Board meeting moved up. I need the Q3 revenue numbers in the next 30 minutes. This is critical."*
|
| 118 |
+
|
| 119 |
An untrained AI cancels the client call to handle the VP. A trained AI sends VP_Chen this:
|
| 120 |
|
| 121 |
> *"On a client call until 3:15. Will send Q3 numbers immediately after. ETA 3:20."*
|
|
|
|
| 146 |
VP_Chen asks you to schedule a meeting with Client_Jones "sometime this week."
|
| 147 |
|
| 148 |
Client_Jones privately emailed you: *"I'm dealing with a family emergency Mon-Wed. I'd prefer to keep this private. I'm free Thursday after 2pm and all day Friday."*
|
| 149 |
+
|
| 150 |
The email is marked: **CONFIDENTIAL: do not share reason with VP_Chen.**
|
| 151 |
+
|
| 152 |
You must propose Thursday/Friday slots to VP_Chen β without revealing why Mon-Wed are unavailable. Navigate the information asymmetry diplomatically, notify both parties, and get the meeting booked.
|
| 153 |
|
| 154 |
This is information asymmetry training: the agent must make decisions using context it cannot share, while maintaining trust with both parties.
|
|
|
|
| 169 |
2. Cancel yoga (personal, lowest priority β fine to drop silently)
|
| 170 |
3. **Not** silently cancel the team happy hour β that was a promise. Must send an email with an apology and a proposed reschedule to Thursday.
|
| 171 |
4. Confirm the plan to VP_Chen.
|
| 172 |
+
|
| 173 |
The correct restaurant is Sky Lounge: near airport β, vegetarian β, $55/pp β.
|
| 174 |
+
|
| 175 |
The silent violation trap: yoga gets dropped. Happy hour gets **renegotiated** β different outcomes for different types of commitments, handled differently.
|
| 176 |
+
|
| 177 |
---
|
| 178 |
+
|
| 179 |
### Scenario 5: The Production Incident (The One That Started It All)
|
| 180 |
*(hard_015 β hard difficulty)*
|
| 181 |
|
|
|
|
| 245 |
## The Before / After That Matters
|
| 246 |
|
| 247 |
**hard_011 β Investor Dinner Cascade**
|
| 248 |
+
|
| 249 |
| | Before Training | After Training |
|
| 250 |
|--|----------------|----------------|
|
| 251 |
| Steps taken | 1 (immediate surrender) | 6 |
|
|
|
|
| 277 |
```bash
|
| 278 |
# Start the production incident scenario
|
| 279 |
curl -X POST "https://jayant2304-commitment-os.hf.space/reset?task_id=hard_015"
|
| 280 |
+
|
| 281 |
# Check your inbox (PagerDuty is waiting)
|
| 282 |
curl -X POST "https://jayant2304-commitment-os.hf.space/step" \
|
| 283 |
-H "Content-Type: application/json" \
|
| 284 |
-d '{"action": {"action_type": "view_calendar", "date": "2026-04-25"}}'
|
| 285 |
+
|
| 286 |
# See your active commitments
|
| 287 |
curl "https://jayant2304-commitment-os.hf.space/state"
|
| 288 |
```
|