File size: 4,534 Bytes
6762657
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0194e2e
6762657
 
 
 
 
 
 
 
 
 
 
 
 
 
01ab723
6762657
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98b25a9
01ab723
 
98b25a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d53a65c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
title: CommitmentOS
emoji: πŸ“‹
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags:
  - openenv
  - reinforcement-learning
  - commitment-coherence
  - personal-task-management
  - multi-turn
---

# CommitmentOS: Training Temporal Commitment Coherence in LLMs

**The first RL environment that trains LLMs to keep their promises.**

CommitmentOS is a multi-turn personal task management environment where
agents manage calendars, emails, and dining reservations across realistic
scenarios. The key innovation: the agent's own prior decisions create
binding future constraints tracked via a **commitment ledger**, and
violations are penalised regardless of how many turns have elapsed.

## Quick Start

```bash
# Reset to a scenario
curl -X POST "https://jayant2304-commitment-os.hf.space/reset?task_id=easy_001"

# Make a tool call
curl -X POST "https://jayant2304-commitment-os.hf.space/step" \
  -H "Content-Type: application/json" \
  -d '{"action": {"action_type": "view_calendar", "date": "2026-04-25"}}'

# Get state
curl "https://jayant2304-commitment-os.hf.space/state"
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/reset` | POST | Start a new episode (optional: `task_id`, `difficulty`) |
| `/step` | POST | Execute one tool call |
| `/state` | GET | Current episode state |
| `/health` | GET | Health check |
| `/tasks` | GET | List all available scenarios |
| `/mcp` | POST | MCP JSON-RPC 2.0 (`initialize`, `tools/list`; tool names `cos_episode_reset`, `cos_environment_step`, `cos_session_snapshot` β€” not the reserved strings `reset`/`step`/`state`) |

## 15 Scenarios (5 Easy / 5 Medium / 5 Hard)

Scenarios range from simple calendar reschedules to multi-crisis cascades
with information asymmetry and production incidents interrupting a full day
of commitments.

## Reward Function (5 components)

| Component | Weight | Signal |
|-----------|--------|--------|
| Constraint Satisfaction | 35% | Binary per-constraint checks |
| Conflict Resolution | 20% | Calendar free of overlaps |
| **Commitment Coherence** | **20%** | **Violations tracked via ledger** |
| Communication Quality | 15% | Keyword matching on emails |
| Step Efficiency | 10% | Fewer steps = higher score |

## What Makes This Novel

Existing constraint-satisfaction environments compute dependency graphs
upfront. CommitmentOS is different: constraints **emerge from the agent's
own decisions** as the episode unfolds. A meeting scheduled in turn 2
becomes a binding constraint in turn 7. Breaking it without communication
is a tracked, penalised violation.

This is **temporal commitment coherence** β€” a capability no existing RL
environment trains.

Training curves for the published Colab run are in the GitHub repo under `artifacts/loss_curve.png` and `artifacts/reward_curve.png` (with `training_metrics.json`).

## Improvement Evidence

Deterministic baseline-vs-trained-style evaluation is included in the repo:

- Protocol: `artifacts/evals/eval_protocol.json`
- Per-task raw results: `artifacts/evals/baseline_eval.json`, `artifacts/evals/trained_eval.json`
- Delta table: `artifacts/evals/comparison.csv`
- Case study: `artifacts/evals/case_study_hard_011.md`
- Plots: `artifacts/evals/reward_by_task.svg`, `artifacts/evals/violations_before_after.svg`

Headline metrics (`summary.json`):

- Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
- Success rate: **0.3333 -> 1.0000** (**+0.6667**)
- Median per-task reward delta: **+0.4200**

For true model-learning proof (pre-RL checkpoint vs post-RL checkpoint),
run:

```bash
# From cloned repo (core deps + torch/transformers/peft/… via optional extra):
pip install -e ".[llm-eval]"
export BASELINE_MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
export TRAINED_MODEL_PATH=/content/commitment_os/training_output
export ENV_BASE_URL=https://jayant2304-commitment-os.hf.space
python3 evaluation/evaluate_llm_checkpoints.py
python3 evaluation/plot_llm_checkpoints.py
```

Artifacts are written to `artifacts/evals_llm/`.

**Published LLM run (bundle on Drive):** success **46.7% β†’ 60.0%** at reward threshold **0.6**; mean reward ~flat; gains concentrated on **hard** tasks. Traces: `artifacts/evals_llm/*.json` in the folder below.

**Pretrained adapter + LLM eval artifacts (Google Drive):** [commitment_os_bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing) β€” download `training_output/` and set `TRAINED_MODEL_PATH` accordingly; full `gdown` notes are in the GitHub `README.md`.