File size: 5,126 Bytes
6762657
9318eea
 
 
 
 
 
 
 
 
 
 
 
6762657
b282a5f
 
 
 
 
6762657
9318eea
6762657
9318eea
6762657
9318eea
 
 
 
 
6762657
9318eea
6762657
 
 
9318eea
6762657
9318eea
 
6762657
 
 
 
9318eea
6762657
 
9318eea
6762657
9318eea
 
 
 
 
 
 
01ab723
6762657
9318eea
6762657
9318eea
 
 
6762657
9318eea
6762657
9318eea
 
 
 
 
 
 
6762657
9318eea
6762657
9318eea
 
 
 
 
6762657
9318eea
 
98b25a9
01ab723
 
98b25a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d53a65c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: CommitmentOS
emoji: πŸ“‹
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
tags:
  - openenv
  - reinforcement-learning
  - commitment-coherence
  - personal-task-management
  - multi-turn
---
## πŸ”— Links
- πŸ“ **Blog / Writeup**: [CommitmentOS: Training LLMs to Keep Their Promises](https://huggingface.co/Jayant2304/Commitment-os)
- πŸ’» **GitHub**: [Jayant2304/commitment_os](https://github.com/Jayant2304/commitment_os)
- πŸ““ **Training Colab**: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Jayant2304/commitment_os/blob/main/training/CommitmentOS_Training.ipynb)
- πŸ“¦ **Weights + artifacts**: [Google Drive bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing)

# CommitmentOS: Training Temporal Commitment Coherence in LLMs

**The first RL environment that trains LLMs to keep their promises.**

CommitmentOS is a multi-turn personal task management environment where
agents manage calendars, emails, and dining reservations across realistic
scenarios. The key innovation: the agent's own prior decisions create
binding future constraints tracked via a **commitment ledger**, and
violations are penalised regardless of how many turns have elapsed.

## Quick Start

```bash
# Reset to a scenario
curl -X POST "https://jayant2304-commitment-os.hf.space/reset?task_id=easy_001"

# Make a tool call
curl -X POST "https://jayant2304-commitment-os.hf.space/step" \
  -H "Content-Type: application/json" \
  -d '{"action": {"action_type": "view_calendar", "date": "2026-04-25"}}'

# Get state
curl "https://jayant2304-commitment-os.hf.space/state"
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/reset` | POST | Start a new episode (optional: `task_id`, `difficulty`) |
| `/step` | POST | Execute one tool call |
| `/state` | GET | Current episode state |
| `/health` | GET | Health check |
| `/tasks` | GET | List all available scenarios |
| `/mcp` | POST | MCP JSON-RPC 2.0 (`initialize`, `tools/list`; tool names `cos_episode_reset`, `cos_environment_step`, `cos_session_snapshot` β€” not the reserved strings `reset`/`step`/`state`) |

## 15 Scenarios (5 Easy / 5 Medium / 5 Hard)

Scenarios range from simple calendar reschedules to multi-crisis cascades
with information asymmetry and production incidents interrupting a full day
of commitments.

## Reward Function (5 components)

| Component | Weight | Signal |
|-----------|--------|--------|
| Constraint Satisfaction | 35% | Binary per-constraint checks |
| Conflict Resolution | 20% | Calendar free of overlaps |
| **Commitment Coherence** | **20%** | **Violations tracked via ledger** |
| Communication Quality | 15% | Keyword matching on emails |
| Step Efficiency | 10% | Fewer steps = higher score |

## What Makes This Novel

Existing constraint-satisfaction environments compute dependency graphs
upfront. CommitmentOS is different: constraints **emerge from the agent's
own decisions** as the episode unfolds. A meeting scheduled in turn 2
becomes a binding constraint in turn 7. Breaking it without communication
is a tracked, penalised violation.

This is **temporal commitment coherence** β€” a capability no existing RL
environment trains.

Training curves for the published Colab run are in the GitHub repo under `artifacts/loss_curve.png` and `artifacts/reward_curve.png` (with `training_metrics.json`).

## Improvement Evidence

Deterministic baseline-vs-trained-style evaluation is included in the repo:

- Protocol: `artifacts/evals/eval_protocol.json`
- Per-task raw results: `artifacts/evals/baseline_eval.json`, `artifacts/evals/trained_eval.json`
- Delta table: `artifacts/evals/comparison.csv`
- Case study: `artifacts/evals/case_study_hard_011.md`
- Plots: `artifacts/evals/reward_by_task.svg`, `artifacts/evals/violations_before_after.svg`

Headline metrics (`summary.json`):

- Mean reward: **0.5427 -> 0.9777** (**+0.4350**)
- Success rate: **0.3333 -> 1.0000** (**+0.6667**)
- Median per-task reward delta: **+0.4200**

For true model-learning proof (pre-RL checkpoint vs post-RL checkpoint),
run:

```bash
# From cloned repo (core deps + torch/transformers/peft/… via optional extra):
pip install -e ".[llm-eval]"
export BASELINE_MODEL_NAME=Qwen/Qwen2.5-1.5B-Instruct
export TRAINED_MODEL_PATH=/content/commitment_os/training_output
export ENV_BASE_URL=https://jayant2304-commitment-os.hf.space
python3 evaluation/evaluate_llm_checkpoints.py
python3 evaluation/plot_llm_checkpoints.py
```

Artifacts are written to `artifacts/evals_llm/`.

**Published LLM run (bundle on Drive):** success **46.7% β†’ 60.0%** at reward threshold **0.6**; mean reward ~flat; gains concentrated on **hard** tasks. Traces: `artifacts/evals_llm/*.json` in the folder below.

**Pretrained adapter + LLM eval artifacts (Google Drive):** [commitment_os_bundle](https://drive.google.com/drive/folders/1yexZBSqyH7gWlTzYN5DlX3tXfPMmeVAK?usp=sharing) β€” download `training_output/` and set `TRAINED_MODEL_PATH` accordingly; full `gdown` notes are in the GitHub `README.md`.