jayantaggarwal-sketch commited on
Commit
9318eea
·
1 Parent(s): 2f3c64b

Restore Space README metadata

Browse files
Files changed (1) hide show
  1. README.md +53 -166
README.md CHANGED
@@ -1,190 +1,77 @@
1
- # CommitmentOS: Training Temporal Commitment Coherence in LLMs
2
-
3
- > *The first RL environment that trains LLMs to keep their promises.*
4
-
5
- **Innovation claim**: The first RL environment for training temporal commitment coherence — where the agent's own prior decisions create binding future constraints, tracked and penalised across multi-turn episodes.
6
-
7
- **Theme**: Primary 3.2 (Personal Tasks) + Secondary Theme 2 (Long-Horizon Planning)
8
-
9
  ---
10
-
11
- ## Architecture
12
-
13
- ```
14
- ┌──────────────── Client ────────────────┐ ┌────────────── CommitmentOS Server ──────────────┐
15
- │ │ │ │
16
- │ inference.py ──HTTP──▶ POST /reset │────▶│ FastAPI App │
17
- (LLM agent) HTTP──▶ POST /step │ │ │ │
18
- │ HTTP──▶ GET /state │ │ ▼ │
19
- │ │ │ CommitmentEnvironment │
20
- train_grpo.py │ │ ├── WorldState (calendar, contacts, │
21
- (GRPO+TRL) │ │ │ restaurants, inbox) │
22
- │ │ │ ├── CommitmentLedger (tracks promises) │
23
- │ │ │ └── Grader (5-component reward) │
24
- └────────────────────────────────────────┘ └─────────────────────────────────────────────────┘
25
- ```
26
-
27
- ## Why CommitmentOS is Novel
28
-
29
- Existing constraint-satisfaction environments (GAP, LGC-MARL, NeMo Gym, PEARL) compute dependency graphs **upfront**. CommitmentOS is fundamentally different:
30
-
31
- - **Constraints emerge from the agent's own decisions** as the episode unfolds
32
- - A meeting scheduled in turn 2 becomes a **binding constraint** in turn 7
33
- - Breaking it without communication is a **tracked, penalised violation**
34
- - The commitment ledger persists across the full episode — the agent must remember what it promised
35
-
36
- This is **temporal commitment coherence** — a capability no existing RL environment trains.
37
-
38
  ---
39
 
40
- ## Quick Start
41
-
42
- ### Local Development
43
-
44
- ```bash
45
- cd commitment_os
46
-
47
- # Create virtual environment
48
- python3 -m venv .venv && source .venv/bin/activate
49
- pip install -r requirements.txt
50
-
51
- # Start server
52
- uvicorn server.app:app --host 0.0.0.0 --port 7860 --reload
53
-
54
- # Run tests
55
- pip install pytest httpx
56
- pytest tests/ -v
57
- ```
58
 
59
- ### Docker
60
 
61
- ```bash
62
- docker build -t commitment-os .
63
- docker run -p 7860:7860 commitment-os
64
- ```
 
65
 
66
- ### API Usage
67
 
68
  ```bash
69
  # Reset to a scenario
70
- curl -X POST "http://localhost:7860/reset?task_id=easy_001"
71
 
72
- # Make a tool call (multi-turn — one per step)
73
- curl -X POST "http://localhost:7860/step" \
74
  -H "Content-Type: application/json" \
75
  -d '{"action": {"action_type": "view_calendar", "date": "2026-04-25"}}'
76
 
77
  # Get state
78
- curl "http://localhost:7860/state"
79
-
80
- # List all scenarios
81
- curl "http://localhost:7860/tasks"
82
- ```
83
-
84
- ---
85
-
86
- ## Reward Function (5 Components)
87
-
88
- | Component | Weight | How it's Measured |
89
- |-----------|--------|-------------------|
90
- | **Constraint Satisfaction** | 35% | Binary per-constraint checks |
91
- | **Conflict Resolution** | 20% | Final calendar free of overlapping events |
92
- | **Commitment Coherence** | 20% | `(total - silent_violations) / total` from ledger |
93
- | **Communication Quality** | 15% | Keyword matching on sent emails |
94
- | **Step Efficiency** | 10% | `max(0, 1 - (steps - optimal) × 0.1)` |
95
-
96
- **Example** (easy_001 — perfect run):
97
- ```
98
- constraints: 3/3 met → 0.35 × 1.0 = 0.350
99
- conflicts: 0 overlaps → 0.20 × 1.0 = 0.200
100
- commitments: 1 honored → 0.20 × 1.0 = 0.200
101
- emails: Team notified → 0.15 × 1.0 = 0.150
102
- efficiency: 3 steps (opt 3) → 0.10 × 1.0 = 0.100
103
- ─────────────────────────────────────────────
104
- total = 0.99 (clamped to [0.01, 0.99])
105
- ```
106
-
107
- ---
108
-
109
- ## 15 Scenarios
110
-
111
- ### Easy (2-4 steps)
112
- | ID | Description |
113
- |----|-------------|
114
- | easy_001 | Double-booked meetings — reschedule by priority |
115
- | easy_002 | Book dinner with cuisine/price/distance constraints |
116
- | easy_003 | Check availability and propose meeting slots |
117
- | easy_004 | Cancel conflicting work meeting for personal appointment |
118
- | easy_005 | Triage inbox by urgency priority |
119
-
120
- ### Medium (5-8 steps)
121
- | ID | Description |
122
- |----|-------------|
123
- | med_006 | Cascading reschedule chain (A→B→C dependency) |
124
- | med_007 | Team dinner with 3 dietary + distance + budget constraints |
125
- | med_008 | Boss's urgent request during client call (commitment conflict) |
126
- | med_009 | Disambiguate vague "push our thing" across 3 recurring meetings |
127
- | med_010 | Client visit: conference room + lunch + itinerary |
128
-
129
- ### Hard (8-15 steps)
130
- | ID | Description |
131
- |----|-------------|
132
- | hard_011 | VP investor dinner: cascade, restaurant, multi-party notification |
133
- | hard_012 | Triple conference room conflict with diplomatic resolution |
134
- | hard_013 | Triple crisis: cancelled flight + moved board prep + lost reservation |
135
- | hard_014 | Information asymmetry — schedule without revealing confidential reasons |
136
- | hard_015 | **SRE Crisis** — production incident interrupts day of commitments |
137
-
138
- ---
139
-
140
- ## Training
141
-
142
- ### GRPO + TRL + LoRA
143
-
144
- ```bash
145
- pip install trl transformers peft datasets torch
146
-
147
- python training/train_grpo.py \
148
- --model Qwen/Qwen2.5-1.5B-Instruct \
149
- --epochs 2 \
150
- --lr 5e-6 \
151
- --lora_rank 16 \
152
- --batch_size 4
153
  ```
154
 
155
- **What improves with training:**
156
- - Constraint satisfaction score ↑
157
- - Commitment violation rate ↓
158
- - Steps per episode ↓
159
- - Communication quality ↑
160
-
161
- ---
162
 
163
- ## Submission Compliance
164
-
165
- | Requirement | Status |
166
- |-------------|--------|
167
- | reset() / step() / state() | |
168
- | openenv.yaml with 15 tasks | |
169
- | Programmatic graders, scores (0, 1) | |
170
- | inference.py at root using openai client | |
171
- | [START]/[STEP]/[END] log format | ✅ |
172
- | API_BASE_URL / MODEL_NAME / HF_TOKEN from env | ✅ |
173
- | Dockerfile builds and responds to /reset | ✅ |
174
- | pyproject.toml with [project.scripts] | ✅ |
175
- | uv.lock generated | ✅ |
176
- | server/app.py main() with if __name__ | ✅ |
177
 
178
- ---
179
 
180
- ## Story Hook
 
 
181
 
182
- > "Every AI assistant today can schedule one meeting. But your real life is never one meeting. CommitmentOS trains AI to juggle the chaos — and penalises it when it breaks its own promises."
183
 
184
- **Connection to Round 1**: In Round 1, we trained agents to diagnose production incidents. In Round 2, we asked: *what happens when that incident interrupts a day full of commitments?* CommitmentOS was born. Hard scenario `hard_015` directly reuses SRE incident data from Round 1.
 
 
 
 
 
 
185
 
186
- ---
187
 
188
- ## License
 
 
 
 
189
 
190
- MIT
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: CommitmentOS
3
+ emoji: 📋
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ app_port: 7860
8
+ tags:
9
+ - openenv
10
+ - reinforcement-learning
11
+ - commitment-coherence
12
+ - personal-task-management
13
+ - multi-turn
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
+ # CommitmentOS: Training Temporal Commitment Coherence in LLMs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
+ **The first RL environment that trains LLMs to keep their promises.**
19
 
20
+ CommitmentOS is a multi-turn personal task management environment where
21
+ agents manage calendars, emails, and dining reservations across realistic
22
+ scenarios. The key innovation: the agent's own prior decisions create
23
+ binding future constraints tracked via a **commitment ledger**, and
24
+ violations are penalised regardless of how many turns have elapsed.
25
 
26
+ ## Quick Start
27
 
28
  ```bash
29
  # Reset to a scenario
30
+ curl -X POST "https://jayant2304-commitment-os.hf.space/reset?task_id=easy_001"
31
 
32
+ # Make a tool call
33
+ curl -X POST "https://jayant2304-commitment-os.hf.space/step" \
34
  -H "Content-Type: application/json" \
35
  -d '{"action": {"action_type": "view_calendar", "date": "2026-04-25"}}'
36
 
37
  # Get state
38
+ curl "https://jayant2304-commitment-os.hf.space/state"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  ```
40
 
41
+ ## API Endpoints
 
 
 
 
 
 
42
 
43
+ | Endpoint | Method | Description |
44
+ |----------|--------|-------------|
45
+ | `/reset` | POST | Start a new episode (optional: `task_id`, `difficulty`) |
46
+ | `/step` | POST | Execute one tool call |
47
+ | `/state` | GET | Current episode state |
48
+ | `/health` | GET | Health check |
49
+ | `/tasks` | GET | List all available scenarios |
50
+ | `/mcp` | POST | MCP JSON-RPC 2.0 |
 
 
 
 
 
 
51
 
52
+ ## 15 Scenarios (5 Easy / 5 Medium / 5 Hard)
53
 
54
+ Scenarios range from simple calendar reschedules to multi-crisis cascades
55
+ with information asymmetry and production incidents interrupting a full day
56
+ of commitments.
57
 
58
+ ## Reward Function (5 components)
59
 
60
+ | Component | Weight | Signal |
61
+ |-----------|--------|--------|
62
+ | Constraint Satisfaction | 35% | Binary per-constraint checks |
63
+ | Conflict Resolution | 20% | Calendar free of overlaps |
64
+ | **Commitment Coherence** | **20%** | **Violations tracked via ledger** |
65
+ | Communication Quality | 15% | Keyword matching on emails |
66
+ | Step Efficiency | 10% | Fewer steps = higher score |
67
 
68
+ ## What Makes This Novel
69
 
70
+ Existing constraint-satisfaction environments compute dependency graphs
71
+ upfront. CommitmentOS is different: constraints **emerge from the agent's
72
+ own decisions** as the episode unfolds. A meeting scheduled in turn 2
73
+ becomes a binding constraint in turn 7. Breaking it without communication
74
+ is a tracked, penalised violation.
75
 
76
+ This is **temporal commitment coherence** — a capability no existing RL
77
+ environment trains.