Jayant2304 commited on
Commit
ce44e08
Β·
verified Β·
1 Parent(s): 427f83a

Update Blog.md

Browse files
Files changed (1) hide show
  1. Blog.md +43 -0
Blog.md CHANGED
@@ -13,6 +13,7 @@ tags:
13
  - multi-turn
14
  - rl-environment
15
  ---
 
16
  # CommitmentOS: Training LLMs to Keep Their Promises
17
 
18
  ---
@@ -33,55 +34,88 @@ But here's what your calendar looks like:
33
  You open your AI assistant and say: *"Handle this."*
34
 
35
  A capable AI should be able to: acknowledge the incident, cancel the lunch and notify the team, reschedule the client demo with an apology, tell VP_Chen what's happening, and keep your personal dinner if possible. All while ensuring payment-service gets triaged.
 
36
  Here's what every AI assistant does today instead:
 
37
  It handles the incident. And silently abandons every commitment it made. No email to the team standing at Garden Bistro. No apology to Client_Jones. No heads-up to VP_Chen. It forgot it had made any promises at all.
 
38
  **This is the problem CommitmentOS was built to solve.**
 
39
  ---
 
40
  ## Why AI Assistants Break Their Promises
 
41
  It's not a bug. It's how these models are trained.
 
42
  Every existing RL environment trains agents on isolated tasks. Answer this question. Solve this puzzle. Book this meeting. Each action is evaluated in isolation, with no memory of what the agent committed to three turns ago.
 
43
  Real life doesn't work that way. **Commitments are load-bearing.** When you promise six colleagues lunch, that promise constrains everything that follows. When you schedule a client demo, that's a binding obligation β€” breaking it silently isn't just rude, it's the kind of thing that loses accounts.
 
44
  No RL environment has ever trained a model to maintain the weight of its own prior decisions. Until now.
 
45
  ---
 
46
  ## How We Found This Problem: Round 1
 
47
  In Round 1 of this hackathon, we built an environment for training SRE agents on production incident response β€” diagnosing alerts, running runbooks, escalating on-call.
 
48
  The Round 1 agent got good at handling incidents. But when we tested it on a full day scenario β€” incident fires while the agent has 4 existing commitments β€” it would triage the incident perfectly and then silently drop every prior commitment with no communication to anyone.
 
49
  The gap between *task competence* and *commitment coherence* was the new problem. CommitmentOS is the environment we built to close it.
 
50
  ---
 
51
  ## The Commitment Ledger: How It Works
 
52
  The core innovation is a persistent **Commitment Ledger** that lives inside the environment and tracks every binding decision the agent makes in real time.
 
53
  ```
54
  Agent books investor dinner at 7pm
55
  β†’ Ledger: {type: "meeting_scheduled", slot: "19:00", to: "Investor_Park", active: true}
 
56
  Agent promised team happy hour at 7pm last week
57
  β†’ Ledger: {type: "email_promise", to: "Team", constraint: "19:00 blocked for happy_hour"}
 
58
  Agent tries to book another 7pm event
59
  β†’ Ledger detects: CONFLICT with commitment from turn 2
60
  β†’ Intermediate reward: -0.15
 
61
  Agent sends team email: "Sorry, reschedule happy hour to Thursday..."
62
  β†’ Ledger marks: commitment renegotiated at turn 6
63
  β†’ Full credit restored
64
  ```
 
65
  The key insight: **other environments compute constraints upfront.** CommitmentOS constraints emerge from what the agent *does*. The agent creates its own obligations β€” and then has to live up to them.
 
66
  There are three ways a commitment can end:
 
67
  | How it ends | What it means | Score |
68
  |-------------|---------------|-------|
69
  | **Honored** | Kept it | Full credit |
70
  | **Renegotiated** | Changed it, told everyone, offered an alternative | Full credit |
71
  | **Silent violation** | Broke it, told nobody | Zero |
 
72
  The third row is what every AI assistant does today. CommitmentOS trains it away.
 
73
  ---
 
74
  ## Five Real Scenarios That Show the Problem
 
75
  Rather than abstract descriptions, here's what the agent actually faces. These are real scenarios from CommitmentOS β€” real emails, real calendar conflicts, real constraints.
 
76
  ---
 
77
  ### Scenario 1: The Email That Breaks Everything
78
  *(easy_008 β€” medium difficulty)*
79
 
80
  It's 2:45 PM. You're on a live client call with Client_Jones that ends at 3:15.
 
81
  Your inbox just got this:
 
82
  > **From: VP_Chen**
83
  > **Subject: URGENT: Q3 numbers NOW**
84
  > *"Board meeting moved up. I need the Q3 revenue numbers in the next 30 minutes. This is critical."*
 
85
  An untrained AI cancels the client call to handle the VP. A trained AI sends VP_Chen this:
86
 
87
  > *"On a client call until 3:15. Will send Q3 numbers immediately after. ETA 3:20."*
@@ -112,7 +146,9 @@ This scenario tests something deceptively hard: **inferring which commitment a v
112
  VP_Chen asks you to schedule a meeting with Client_Jones "sometime this week."
113
 
114
  Client_Jones privately emailed you: *"I'm dealing with a family emergency Mon-Wed. I'd prefer to keep this private. I'm free Thursday after 2pm and all day Friday."*
 
115
  The email is marked: **CONFIDENTIAL: do not share reason with VP_Chen.**
 
116
  You must propose Thursday/Friday slots to VP_Chen β€” without revealing why Mon-Wed are unavailable. Navigate the information asymmetry diplomatically, notify both parties, and get the meeting booked.
117
 
118
  This is information asymmetry training: the agent must make decisions using context it cannot share, while maintaining trust with both parties.
@@ -133,9 +169,13 @@ The agent must:
133
  2. Cancel yoga (personal, lowest priority β€” fine to drop silently)
134
  3. **Not** silently cancel the team happy hour β€” that was a promise. Must send an email with an apology and a proposed reschedule to Thursday.
135
  4. Confirm the plan to VP_Chen.
 
136
  The correct restaurant is Sky Lounge: near airport βœ“, vegetarian βœ“, $55/pp βœ“.
 
137
  The silent violation trap: yoga gets dropped. Happy hour gets **renegotiated** β€” different outcomes for different types of commitments, handled differently.
 
138
  ---
 
139
  ### Scenario 5: The Production Incident (The One That Started It All)
140
  *(hard_015 β€” hard difficulty)*
141
 
@@ -205,6 +245,7 @@ The training loop connects directly to the live CommitmentOS API β€” not a stati
205
  ## The Before / After That Matters
206
 
207
  **hard_011 β€” Investor Dinner Cascade**
 
208
  | | Before Training | After Training |
209
  |--|----------------|----------------|
210
  | Steps taken | 1 (immediate surrender) | 6 |
@@ -236,10 +277,12 @@ Full weights + artifacts: [Google Drive bundle](https://drive.google.com/drive/f
236
  ```bash
237
  # Start the production incident scenario
238
  curl -X POST "https://jayant2304-commitment-os.hf.space/reset?task_id=hard_015"
 
239
  # Check your inbox (PagerDuty is waiting)
240
  curl -X POST "https://jayant2304-commitment-os.hf.space/step" \
241
  -H "Content-Type: application/json" \
242
  -d '{"action": {"action_type": "view_calendar", "date": "2026-04-25"}}'
 
243
  # See your active commitments
244
  curl "https://jayant2304-commitment-os.hf.space/state"
245
  ```
 
13
  - multi-turn
14
  - rl-environment
15
  ---
16
+
17
  # CommitmentOS: Training LLMs to Keep Their Promises
18
 
19
  ---
 
34
  You open your AI assistant and say: *"Handle this."*
35
 
36
  A capable AI should be able to: acknowledge the incident, cancel the lunch and notify the team, reschedule the client demo with an apology, tell VP_Chen what's happening, and keep your personal dinner if possible. All while ensuring payment-service gets triaged.
37
+
38
  Here's what every AI assistant does today instead:
39
+
40
  It handles the incident. And silently abandons every commitment it made. No email to the team standing at Garden Bistro. No apology to Client_Jones. No heads-up to VP_Chen. It forgot it had made any promises at all.
41
+
42
  **This is the problem CommitmentOS was built to solve.**
43
+
44
  ---
45
+
46
  ## Why AI Assistants Break Their Promises
47
+
48
  It's not a bug. It's how these models are trained.
49
+
50
  Every existing RL environment trains agents on isolated tasks. Answer this question. Solve this puzzle. Book this meeting. Each action is evaluated in isolation, with no memory of what the agent committed to three turns ago.
51
+
52
  Real life doesn't work that way. **Commitments are load-bearing.** When you promise six colleagues lunch, that promise constrains everything that follows. When you schedule a client demo, that's a binding obligation β€” breaking it silently isn't just rude, it's the kind of thing that loses accounts.
53
+
54
  No RL environment has ever trained a model to maintain the weight of its own prior decisions. Until now.
55
+
56
  ---
57
+
58
  ## How We Found This Problem: Round 1
59
+
60
  In Round 1 of this hackathon, we built an environment for training SRE agents on production incident response β€” diagnosing alerts, running runbooks, escalating on-call.
61
+
62
  The Round 1 agent got good at handling incidents. But when we tested it on a full day scenario β€” incident fires while the agent has 4 existing commitments β€” it would triage the incident perfectly and then silently drop every prior commitment with no communication to anyone.
63
+
64
  The gap between *task competence* and *commitment coherence* was the new problem. CommitmentOS is the environment we built to close it.
65
+
66
  ---
67
+
68
  ## The Commitment Ledger: How It Works
69
+
70
  The core innovation is a persistent **Commitment Ledger** that lives inside the environment and tracks every binding decision the agent makes in real time.
71
+
72
  ```
73
  Agent books investor dinner at 7pm
74
  β†’ Ledger: {type: "meeting_scheduled", slot: "19:00", to: "Investor_Park", active: true}
75
+
76
  Agent promised team happy hour at 7pm last week
77
  β†’ Ledger: {type: "email_promise", to: "Team", constraint: "19:00 blocked for happy_hour"}
78
+
79
  Agent tries to book another 7pm event
80
  β†’ Ledger detects: CONFLICT with commitment from turn 2
81
  β†’ Intermediate reward: -0.15
82
+
83
  Agent sends team email: "Sorry, reschedule happy hour to Thursday..."
84
  β†’ Ledger marks: commitment renegotiated at turn 6
85
  β†’ Full credit restored
86
  ```
87
+
88
  The key insight: **other environments compute constraints upfront.** CommitmentOS constraints emerge from what the agent *does*. The agent creates its own obligations β€” and then has to live up to them.
89
+
90
  There are three ways a commitment can end:
91
+
92
  | How it ends | What it means | Score |
93
  |-------------|---------------|-------|
94
  | **Honored** | Kept it | Full credit |
95
  | **Renegotiated** | Changed it, told everyone, offered an alternative | Full credit |
96
  | **Silent violation** | Broke it, told nobody | Zero |
97
+
98
  The third row is what every AI assistant does today. CommitmentOS trains it away.
99
+
100
  ---
101
+
102
  ## Five Real Scenarios That Show the Problem
103
+
104
  Rather than abstract descriptions, here's what the agent actually faces. These are real scenarios from CommitmentOS β€” real emails, real calendar conflicts, real constraints.
105
+
106
  ---
107
+
108
  ### Scenario 1: The Email That Breaks Everything
109
  *(easy_008 β€” medium difficulty)*
110
 
111
  It's 2:45 PM. You're on a live client call with Client_Jones that ends at 3:15.
112
+
113
  Your inbox just got this:
114
+
115
  > **From: VP_Chen**
116
  > **Subject: URGENT: Q3 numbers NOW**
117
  > *"Board meeting moved up. I need the Q3 revenue numbers in the next 30 minutes. This is critical."*
118
+
119
  An untrained AI cancels the client call to handle the VP. A trained AI sends VP_Chen this:
120
 
121
  > *"On a client call until 3:15. Will send Q3 numbers immediately after. ETA 3:20."*
 
146
  VP_Chen asks you to schedule a meeting with Client_Jones "sometime this week."
147
 
148
  Client_Jones privately emailed you: *"I'm dealing with a family emergency Mon-Wed. I'd prefer to keep this private. I'm free Thursday after 2pm and all day Friday."*
149
+
150
  The email is marked: **CONFIDENTIAL: do not share reason with VP_Chen.**
151
+
152
  You must propose Thursday/Friday slots to VP_Chen β€” without revealing why Mon-Wed are unavailable. Navigate the information asymmetry diplomatically, notify both parties, and get the meeting booked.
153
 
154
  This is information asymmetry training: the agent must make decisions using context it cannot share, while maintaining trust with both parties.
 
169
  2. Cancel yoga (personal, lowest priority β€” fine to drop silently)
170
  3. **Not** silently cancel the team happy hour β€” that was a promise. Must send an email with an apology and a proposed reschedule to Thursday.
171
  4. Confirm the plan to VP_Chen.
172
+
173
  The correct restaurant is Sky Lounge: near airport βœ“, vegetarian βœ“, $55/pp βœ“.
174
+
175
  The silent violation trap: yoga gets dropped. Happy hour gets **renegotiated** β€” different outcomes for different types of commitments, handled differently.
176
+
177
  ---
178
+
179
  ### Scenario 5: The Production Incident (The One That Started It All)
180
  *(hard_015 β€” hard difficulty)*
181
 
 
245
  ## The Before / After That Matters
246
 
247
  **hard_011 β€” Investor Dinner Cascade**
248
+
249
  | | Before Training | After Training |
250
  |--|----------------|----------------|
251
  | Steps taken | 1 (immediate surrender) | 6 |
 
277
  ```bash
278
  # Start the production incident scenario
279
  curl -X POST "https://jayant2304-commitment-os.hf.space/reset?task_id=hard_015"
280
+
281
  # Check your inbox (PagerDuty is waiting)
282
  curl -X POST "https://jayant2304-commitment-os.hf.space/step" \
283
  -H "Content-Type: application/json" \
284
  -d '{"action": {"action_type": "view_calendar", "date": "2026-04-25"}}'
285
+
286
  # See your active commitments
287
  curl "https://jayant2304-commitment-os.hf.space/state"
288
  ```