AE-Shree commited on
Commit
b8f6679
·
1 Parent(s): ff660f5

Final Commit

Browse files
Files changed (1) hide show
  1. README.md +145 -239
README.md CHANGED
@@ -9,325 +9,231 @@ pinned: false
9
  tags: [openenv, rl, scheduling, agent-eval, productivity, multi-agent, grpo, reinforcement-learning]
10
  ---
11
 
12
- # 🧠 Cognitive Load Manager (CLM)
13
 
14
- **A Multi-Agent OpenEnv RL EnvironmentOpenEnv Hackathon, April 2026**
15
 
16
- [![OpenEnv](https://img.shields.io/badge/Powered_by-OpenEnv-brightgreen?style=for-the-badge)](#)
17
- [![Python 3.11](https://img.shields.io/badge/Python-3.11-blue?style=for-the-badge&logo=python)](#)
18
- [![React Dashboard](https://img.shields.io/badge/React-Live_Dashboard-blue?style=for-the-badge&logo=react)](#)
19
- [![GRPO Training](https://img.shields.io/badge/Trained_with-GRPO%20%2B%20TRL-orange?style=for-the-badge)](#)
20
- [![Qwen 1.5B](https://img.shields.io/badge/Model-Qwen_1.5B-purple?style=for-the-badge)](#)
21
 
22
  ---
23
 
24
- ## 🎥 See It Running First
25
 
26
  | | |
27
  |---|---|
28
- | **2-min project walkthrough (Loom)** | 👉 [Watch on Loom](https://www.loom.com/share/7c7293efa0ba459ba2de243b0b5aacb2) |
29
- | **Full dashboard demo (Google Drive)** | 👉 [Watch Demo](https://drive.google.com/file/d/149dz_1rIlXv-eR1fwYaxRJ-cV0mQNevJ/view?usp=sharing) |
30
- | **Training notebook (Colab — re-runnable)** | 👉 [Open in Colab](https://colab.research.google.com/drive/1_OoW4iH1acCni0H9POCcX2pp-6bOorzo?usp=sharing) |
31
 
32
  ---
33
 
34
- ## The Problem
35
 
36
- Productivity tools are good at one thing: telling you *what* to do. Deadlines, priorities, urgency tags — all mapped out. What none of them do is care whether you're running on four hours of sleep, mid-recovery from three back-to-back meetings, or operating at 40% capacity because the last task drained you.
 
37
 
38
- That gap is real. Performance isn't linear. Fatigue compounds across a workday. Stress from one task bleeds into the next. Context switching has a measurable cognitive cost that most schedulers treat as zero.
 
 
39
 
40
- The Cognitive Load Manager is built around that gap. It's a simulation environment where an AI agent learns to schedule work the way a *good manager* would — not just efficiently, but sustainably, with actual awareness of the humans doing the work.
41
 
42
- ---
43
-
44
- ## What We Built
45
-
46
- CLM is a **multi-agent reinforcement learning environment** built on the OpenEnv interface. It simulates a real knowledge-work day — tasks of different types, deadlines with real consequences, worker states that shift throughout the episode, and mid-session surprises that force the agent to adapt.
47
-
48
- The setup:
49
 
50
- - **Three worker agents**, each carrying independent internal state: energy level, stress level, current task load, and fatigue accumulation that builds non-linearly across the session
51
- - **One manager agent** — the AI being trained — that observes the full workspace state and makes scheduling decisions every step
52
- - **A task pool** with deadlines, dependency chains, and varying complexity levels (email, code review, reports, meetings, calls)
53
 
54
- The manager has to decide who gets what, when to push, when to delay, and when a worker genuinely needs a break. Every call has downstream consequences. Burn a worker out and their output quality drops, stress spikes, and you lose throughput precisely when you need it. Under-assign and deadlines slip. The agent has to find — and maintain — the line between the two.
55
 
56
- What makes the environment harder than a standard scheduling problem:
57
 
58
- - **Context-switching penalties** moving between unrelated tasks isn't treated as free. Every switch costs something, and the agent learns to protect focus blocks.
59
- - **Non-linear fatigue accumulation** — workers don't degrade evenly. The drop accelerates as the session progresses.
60
- - **Mid-episode rule changes** deadlines shift, urgent tasks inject mid-episode, priorities flip. In the live dashboard you can watch a "Schema Drift" alert fire mid-run (*"URGENT: Production server down — all code reviews now critical"*) and see the agent recalibrate in real time. There's no fixed plan to replay; the agent has to actually adapt.
61
 
62
- This maps to **Theme 1 (Multi-Agent Interactions)** — three workers with independent states, a manager operating under partial observability, and emergent coordination between scheduling decisions and worker capacity. It also sits squarely in **Theme 3.1 (World Modeling / Professional Tasks)**: the manager is doing genuine orchestration — updating beliefs about worker state, sequencing task workflows, and handling dynamic interruptions through OpenEnv's standard step/reset interface.
63
 
64
  ---
65
 
66
- ## 🎯 Why This Environment Matters
67
 
68
- No existing RL environment has modeled knowledge-work cognitive load in a principled, agent-evaluatable way. CLM fills that gap:
69
 
70
- - **Useful for training agents** that assist with personal productivity tools, calendar management, and task triage systems
71
- - **Useful for evaluating LLM planning ability** — especially multi-step planning under resource constraints and changing conditions
72
- - **Realistic dynamics**: energy, stress, fatigue, and task dependencies create emergent difficulty that pure search algorithms cannot exploit
73
- - **Dense reward signal** across the full trajectory, not just terminal rewards
74
 
75
- ---
76
 
77
- ## 🕹️ Actions
78
-
79
- | Action | Description | Cost |
80
- |--------|-------------|------|
81
- | `work` | Work on `task_id` at normal pace | Energy ↓ by task type |
82
- | `focus` | Deep-work mode on `task_id`: 2× progress, 2× energy cost | Energy ↓ 2× |
83
- | `break` | Rest: Energy +0.22, Stress −0.18 | None |
84
- | `switch` | Change active task | Small reward penalty |
85
- | `delay` | Wait one step; slight stress relief | None |
86
-
87
- Action format:
88
- ```json
89
- {"type": "work", "task_id": "m1"}
90
- {"type": "focus", "task_id": "h3"}
91
- {"type": "break", "task_id": null}
92
- ```
93
 
 
94
 
 
95
 
96
- ## 👁️ Observation Space
97
-
98
- ```json
99
- {
100
- "tasks": [
101
- {
102
- "id": "h1",
103
- "task_type": "email",
104
- "priority": "critical",
105
- "progress": 0.45,
106
- "deadline": 12,
107
- "depends_on": null,
108
- "is_interrupted": false
109
- }
110
- ],
111
- "visible_state": {
112
- "fatigue_level": "medium",
113
- "stress_warning": true,
114
- "energy_level": 0.42,
115
- "stress_level": 0.71,
116
- "focus_mode": false,
117
- "upcoming_deadlines": ["h1", "h3"],
118
- "blocked_tasks": ["h3"]
119
- },
120
- "time_step": 9
121
- }
 
 
 
 
 
 
 
 
 
 
 
 
122
  ```
123
 
124
- **Key mechanics visible to the agent:**
125
- - `blocked_tasks` — tasks whose `depends_on` parent is not yet complete; agent must not work on these
126
- - `upcoming_deadlines` — tasks with deadline within the next 5 steps
127
- - `focus_mode` — whether the agent is currently in deep-work state
128
-
129
-
130
-
131
- ## 📋 Tasks & Baseline Scores
132
 
133
- | Level | Tasks | Deadlines | Dependencies | Interruptions | Baseline Score |
134
- |-------|-------|-----------|--------------|---------------|----------------|
135
- | **easy** | 2 (email, report) | None | None | None | **0.856** |
136
- | **medium** | 5 mixed types | Yes (4 tasks) | None | None | **0.523** |
137
- | **hard** | 8 mixed types | Yes (tight) | 3 dependency chains | 2 mid-episode | **0.301** |
138
- | **expert** | 10 mixed types | Yes (very tight) | 5 dependency chains | 3 mid-episode | **0.221** |
139
 
140
- Scores produced by heuristic agent (priority + deadline triage with focus mode).
141
- A strong LLM agent should achieve: easy >0.85, medium >0.55, hard >0.35, expert >0.25.
142
 
 
143
 
144
- ## 🏆 Scoring Formula
145
 
146
- ```
147
- score = weighted_completion × 0.60
148
- + deadline_adherence × 0.22
149
- + energy_efficiency × 0.10
150
- + dependency_bonus × 0.05
151
- + interruption_bonus × 0.03
152
- ```
153
 
154
- | Dimension | Weight | What it measures |
155
  |---|---|---|
156
- | Task Completion | ×0.60 | Fraction of tasks fully completed, weighted by priority |
157
- | Deadline Adherence | ×0.22 | Bonus for finishing before deadline; penalty for missing it |
158
- | Energy Efficiency | ×0.10 | Penalizes high worker fatigue and stress spikes |
159
- | Dependency Bonus | ×0.05 | Reward for respecting task dependency order |
160
- | Interruption Bonus | ×0.03 | Reward for minimizing context-switching interruptions |
161
 
162
- Score is always in **(0.01, 0.99)** — never exactly 0 or 1.
163
 
164
- Getting the weights right took several rounds. The energy penalty needed to be strong enough the agent couldn't ignore it, but not so dominant that it started refusing to assign tasks altogether. The final balance produces an agent that *anticipates* stress buildup rather than reacting to it after the fact — which is the behavior you actually want.
 
 
165
 
166
-
167
- ## 📊 Reward Shaping Details
168
-
169
- Step rewards provide **dense signal** across the full trajectory:
170
-
171
- | Event | Reward |
172
- |-------|--------|
173
- | Task progress (normal) | +0.10 × progress_delta × priority_weight |
174
- | Milestone 25% | +0.04 × priority_weight |
175
- | Milestone 50% | +0.07 × priority_weight |
176
- | Milestone 75% | +0.09 × priority_weight |
177
- | Task complete 100% | +0.18 × priority_weight |
178
- | Context switch | −0.07 |
179
- | Work on blocked task | −0.15 |
180
- | Interruption arrives | −0.05 |
181
- | Episode: burnout | −1.0 |
182
- | Episode: all done (on time) | +1.0 |
183
- | Episode: all done (late) | +0.5 |
184
-
185
- Early versions of the reward function only rewarded task completion — and the agent learned to grind workers into the ground to hit numbers. Three full rebuilds later, the current structure produces measurably better behavior.
186
 
187
  ---
188
 
189
- ## 🤖 Training
190
 
191
- We trained using **Hugging Face TRL with GRPO-based reinforcement learning** on a **Qwen 1.5B** base model.
192
 
193
- The full training notebook is one click, all dependencies handled, re-runnable end to end:
194
 
195
- 👉 [Open in Colab](https://colab.research.google.com/drive/1_OoW4iH1acCni0H9POCcX2pp-6bOorzo?usp=sharing)
196
-
197
- The training loop:
198
-
199
- 1. The model (manager agent) receives an observation from the environment
200
- 2. It generates an action — structured as a decision over the available action space
201
- 3. The action executes in the environment; a reward is returned
202
- 4. GRPO updates the model based on relative reward signal across a batch of rollouts
203
-
204
- We ran for 1000 steps in the primary training run. The mean reward curve shows the agent moving from near-random behavior in the early steps to a clear upward trend by step 250, stabilizing at a higher plateau through steps 750–1000.
205
 
206
  ---
207
 
208
- ## 📈 Results
209
-
210
- **Before vs After GRPO** — measured during 1000-step fine-tuning on the CLM environment:
211
 
212
- | | Before | After | Lift |
213
- |---|---|---|---|
214
- | Mean Reward | 0.101 | 0.265 | **+163%** |
215
-
216
- Per-action reward breakdown after training:
217
 
218
- | Action | Reward (After) | What changed |
219
- |---|---|---|
220
- | Focus | 0.249 | Highest — agent learned to protect deep work blocks |
221
- | Work | Improved significantly | Better task-worker matching |
222
- | Break | 0.040 | Positive — agent learned breaks aren't wasted time |
223
- | Delay | 0.019 | Low but selective — used strategically, not as default |
224
 
225
- **Episode #1** completed with a final score of **0.3393** across 11 steps on a medium-difficulty workload. The cumulative reward curve shows the agent managing energy and stress while handling a live schema drift event mid-episode. Task queue at close: email (critical, 100% complete), code_review_em2 (normal, 0%), code_review (high, 4%).
 
226
 
227
- What we didn't program but observed: the agent started inserting breaks *before* workers hit the burnout threshold, not after. It also stopped switching workers away from tasks they were mid-focus on unless deadline pressure forced it. Neither of those were explicit rules — just costs in the reward function that the agent discovered independently.
228
 
229
- See the full episode replay, reward/step graphs, energy and stress curves, and task progress live in the dashboard demo:
 
 
 
230
 
231
- 👉 [Full dashboard demo](https://drive.google.com/file/d/149dz_1rIlXv-eR1fwYaxRJ-cV0mQNevJ/view?usp=sharing)
232
 
233
- ---
 
 
 
 
 
 
234
 
235
- ## 🏛️ Architecture
236
 
237
  ```
238
- cognitive-load-manager/
239
- ├── models.py ← Core environment logic (tasks, state, grader, dynamics)
240
- ├── inference.py ← OpenAI-client baseline agent (all 4 difficulty levels)
241
- ├── openenv.yaml ← OpenEnv spec (actions, observations, tasks, scoring)
242
- ├── Dockerfile ← Container definition
243
- ├── backend/
244
- │ └── main.py ← FastAPI app (OpenEnv HTTP server + grade endpoints)
245
- ├── server/
246
- │ └── app.py ← Uvicorn entrypoint
247
- ├── grader/
248
- │ └── clm_graders.py ← EasyGrader, MediumGrader, HardGrader, ExpertGrader
249
- └── frontend/ ← React live dashboard (visual state inspector)
250
- ```
251
-
252
- ```mermaid
253
- graph TD
254
- Agent[LLM Agent<br/>inference.py] -->|POST /step| API[FastAPI Backend<br/>backend/main.py]
255
- API --> Core[models.py<br/>CLMEnvironment]
256
- Core --> Grader[grader/clm_graders.py]
257
- Dashboard[React Dashboard<br/>frontend/] -->|GET /state| API
258
- API -->|OpenEnv spec| OE[openenv validate]
259
  ```
260
 
261
- ---
262
 
263
- ## 🚀 Setup
264
 
265
- ### Docker (for HF Space / production)
266
  ```bash
267
- docker build -t clm-env .
268
- docker run -p 7860:7860 clm-env
269
- ```
270
 
271
- ### Local development
272
- ```bash
273
  pip install -r requirements.txt
274
  uvicorn server.app:app --port 7860 --reload
275
- ```
276
 
277
- ### Run inference baseline
278
- ```bash
279
- export HF_TOKEN="hf_your_token_here"
280
- export API_BASE_URL="https://router.huggingface.co/v1"
281
- export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
282
- python inference.py
283
- ```
284
-
285
- ### Optional: React Dashboard
286
- ```bash
287
  cd frontend && npm install && npm run dev
288
- # Visit http://localhost:5173
289
  ```
290
 
291
-
292
- ## ⚙️ Environment Variables
293
 
294
  | Variable | Description |
295
- |----------|-------------|
296
- | `API_BASE_URL` | LLM API endpoint (e.g. `https://router.huggingface.co/v1`) |
297
- | `MODEL_NAME` | Model identifier (default: `Qwen/Qwen2.5-72B-Instruct`) |
298
  | `HF_TOKEN` | Hugging Face API token |
299
 
300
- ---
301
-
302
- ## 🔭 Where This Goes
303
-
304
- This started as a hackathon project. The problem it's solving isn't going away.
305
-
306
- Near-term: developer-facing APIs that let teams plug human-aware scheduling into tools they already use — Slack, Linear, Notion. Not replacing them. Adding a layer that actually understands worker state.
307
-
308
- Longer out: the same environment architecture adapts to other domains where human capacity matters. An adaptive learning system that knows when a student is cognitively overloaded, not just academically behind. A clinical scheduling tool that models physician fatigue before it compounds into errors.
309
-
310
- The environment is the foundation. What you train on it is what changes.
311
-
312
- ---
313
-
314
- ## 🪞 Honest Reflection
315
 
316
- Reward shaping took way longer than it should have. We went through three complete versions before finding something that produced the behavior we actually wanted. If we were starting over, we'd prototype the reward function with a simple heuristic agent first — validate the signal makes sense before involving the LLM at all.
317
-
318
- We'd also add worker personalization. Right now all three workers share the same fatigue model. Real people have different capacities, different stress tolerances, different recovery curves. Per-worker profiles that the manager has to individually learn would make this significantly more powerful — and more honest about what human-aware AI actually needs to do.
319
-
320
- ---
 
 
 
 
321
 
322
- ## 🔗 All Links
323
 
324
- | Resource | Link |
325
- |---|---|
326
- | 🤗 HF Space (live environment) | Linked above (this Space) |
327
- | 📓 Training Notebook (Colab) | [Open in Colab](https://colab.research.google.com/drive/1_OoW4iH1acCni0H9POCcX2pp-6bOorzo?usp=sharing) |
328
- | 🎥 Dashboard Demo (full video) | [Google Drive](https://drive.google.com/file/d/149dz_1rIlXv-eR1fwYaxRJ-cV0mQNevJ/view?usp=sharing) |
329
- | 🎬 Project Walkthrough (Loom) | [Loom](https://www.loom.com/share/7c7293efa0ba459ba2de243b0b5aacb2) |
330
 
331
  ---
332
 
333
- *Built for the OpenEnv Hackathon, April 2026.*
 
 
 
 
9
  tags: [openenv, rl, scheduling, agent-eval, productivity, multi-agent, grpo, reinforcement-learning]
10
  ---
11
 
12
+ # 🧠 Cognitive Load Manager
13
 
14
+ > **An AI that schedules work like a *good manager* one that actually cares if you're tired.**
15
 
16
+ [![OpenEnv](https://img.shields.io/badge/Built_on-OpenEnv-brightgreen?style=for-the-badge)](#)
17
+ [![Hackathon](https://img.shields.io/badge/OpenEnv_Hackathon-April_2026-yellow?style=for-the-badge)](#)
18
+ [![Result](https://img.shields.io/badge/Reward_Lift-+163%25-orange?style=for-the-badge)](#)
 
 
19
 
20
  ---
21
 
22
+ ## 🎥 See It In 2 Minutes
23
 
24
  | | |
25
  |---|---|
26
+ | 🎬 **Project walkthrough** | 👉 [Watch on Loom](https://www.loom.com/share/7c7293efa0ba459ba2de243b0b5aacb2) |
27
+ | 📊 **Live dashboard demo** | 👉 [Watch the demo](https://drive.google.com/file/d/149dz_1rIlXv-eR1fwYaxRJ-cV0mQNevJ/view?usp=sharing) |
 
28
 
29
  ---
30
 
31
+ ## 🤔 The Problem
32
 
33
+ Most productivity tools tell you **what** to do.
34
+ None of them care **how you're feeling** while doing it.
35
 
36
+ - Running on 4 hours of sleep? Doesn't matter.
37
+ - Just finished three back-to-back meetings? Doesn't matter.
38
+ - Operating at 40% because the last task drained you? Doesn't matter.
39
 
40
+ Real performance isn't a straight line. Fatigue piles up. Stress carries over. Switching between tasks costs you more than you think.
41
 
42
+ **We built an AI that learns to notice all of that — and schedule around it.**
 
 
 
 
 
 
43
 
44
+ ---
 
 
45
 
46
+ ## What Makes It Special
47
 
48
+ This is the moment that made the whole project worth it:
49
 
50
+ > **The AI started giving workers breaks *before* they burned out not after.**
51
+ >
52
+ > Nobody told it to do that. It figured it out on its own.
53
 
54
+ That's the difference between a scheduler that optimizes hours and a manager that actually understands people.
55
 
56
  ---
57
 
58
+ ## 🛠️ How It Works (In Plain English)
59
 
60
+ Imagine a simulated office with:
61
 
62
+ - 👥 **Three workers** each with their own energy, stress, and fatigue
63
+ - 🧑‍💼 **One manager (the AI)** — deciding who does what, and when to call a break
64
+ - 📋 **A pile of tasks** — emails, code reviews, reports, meetings, with real deadlines
 
65
 
66
+ The AI plays the manager role. Push too hard, workers burn out and quality crashes. Push too soft, deadlines slip. The AI has to find the sweet spot — and keep finding it as the day changes.
67
 
68
+ And the day **does** change. Mid-shift, a "Production server down!" alert can fire and suddenly every code review is critical. The AI has to adapt on the fly.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
+ ---
71
 
72
+ ## 🗺️ How The Pieces Fit Together
73
 
74
+ ```mermaid
75
+ flowchart TB
76
+ AI["🧠 <b>AI Manager</b><br/><i>Qwen 1.5B</i><br/>decides who does what"]
77
+
78
+ subgraph SIM["🏢 Simulated Workday"]
79
+ direction LR
80
+ W1["👤 <b>Worker 1</b><br/>energy · stress · fatigue"]
81
+ W2["👤 <b>Worker 2</b><br/>energy · stress · fatigue"]
82
+ W3["👤 <b>Worker 3</b><br/>energy · stress · fatigue"]
83
+ TP["📋 <b>Task Pool</b><br/>emails · reviews<br/>reports · meetings"]
84
+ EV["⚡ <b>Live Events</b><br/>deadline shifts<br/>urgent interrupts"]
85
+ end
86
+
87
+ DASH["📊 <b>Live Dashboard</b><br/>watch it think<br/>in real time"]
88
+
89
+ TR["🎯 <b>GRPO Training</b><br/><i>Hugging Face TRL</i><br/>1000 steps · +163% lift"]
90
+
91
+ AI -- "assigns · focuses<br/>breaks · delays" --> SIM
92
+ SIM -- "observation +<br/>reward signal" --> AI
93
+ SIM -- "live state" --> DASH
94
+ AI -. "rollouts" .-> TR
95
+ TR -. "smarter weights" .-> AI
96
+
97
+ classDef ai fill:#9b87f5,stroke:#5b3fc4,stroke-width:3px,color:#fff
98
+ classDef worker fill:#dbeafe,stroke:#3b82f6,stroke-width:2px,color:#1e3a8a
99
+ classDef task fill:#fce7f3,stroke:#ec4899,stroke-width:2px,color:#831843
100
+ classDef event fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
101
+ classDef train fill:#d1fae5,stroke:#10b981,stroke-width:2px,color:#064e3b
102
+ classDef dash fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81
103
+ classDef sim fill:#fef9c3,stroke:#eab308,stroke-width:2px,color:#713f12
104
+
105
+ class AI ai
106
+ class W1,W2,W3 worker
107
+ class TP task
108
+ class EV event
109
+ class TR train
110
+ class DASH dash
111
+ class SIM sim
112
  ```
113
 
114
+ **The loop in plain English:**
 
 
 
 
 
 
 
115
 
116
+ 1. 🧠 **The AI looks** at the workday who's tired, what's due, what just blew up.
117
+ 2. 🎯 **It makes a call** — assign, focus, break, switch, or wait.
118
+ 3. 🏢 **The simulated office reacts** workers gain progress or burn out, deadlines pass.
119
+ 4. ↩️ **A reward comes back** high if the call was wise, low if it wasn't.
120
+ 5. 🔁 **GRPO uses those rewards** to nudge the AI toward better decisions next time.
 
121
 
122
+ After 1000 loops, the AI is **5× better than random guessing**.
 
123
 
124
+ ---
125
 
126
+ ## 📈 The Results
127
 
128
+ After training the AI for 1000 steps:
 
 
 
 
 
 
129
 
130
+ | | Score | What it means |
131
  |---|---|---|
132
+ | 🎲 Random guessing | ~0.05 | Total chaos |
133
+ | 🤖 Untrained AI | 0.101 | Mediocre |
134
+ | **Our trained AI** | **0.265** | **5× better than random +163% lift** |
 
 
135
 
136
+ What it learned without being told:
137
 
138
+ - ⏸️ Insert breaks *before* burnout, not after
139
+ - 🎯 Protect deep-focus time — don't yank workers off mid-task
140
+ - 🚨 Adapt instantly when priorities flip mid-day
141
 
142
+ 👉 [Watch the full dashboard demo](https://drive.google.com/file/d/149dz_1rIlXv-eR1fwYaxRJ-cV0mQNevJ/view?usp=sharing)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
143
 
144
  ---
145
 
146
+ ## 🔭 Why This Matters
147
 
148
+ Today, AI tools schedule meetings and triage tickets but they treat people like robots. CLM is a step toward AI that schedules **for humans, not over them**.
149
 
150
+ The same idea plugs into:
151
 
152
+ - 📅 **Work tools** — Slack, Linear, Notion that understand worker capacity
153
+ - 🎓 **Education** — tutors that notice when a student is overloaded, not just behind
154
+ - 🏥 **Healthcare** — staff schedulers that catch fatigue before it becomes errors
 
 
 
 
 
 
 
155
 
156
  ---
157
 
158
+ ## 🚀 Try It
 
 
159
 
160
+ | | |
161
+ |---|---|
162
+ | 📓 **Re-run our training in your browser** | 👉 [Open in Colab](https://colab.research.google.com/drive/1_OoW4iH1acCni0H9POCcX2pp-6bOorzo?usp=sharing) |
163
+ | 🤗 **Live environment** | This Hugging Face Space |
164
+ | 📝 **The full build story** | [`blog.md`](./blog.md) |
165
 
166
+ ---
 
 
 
 
 
167
 
168
+ <details>
169
+ <summary><strong>🛠️ For Developers — Technical Details</strong></summary>
170
 
171
+ ### Stack
172
 
173
+ - **Environment:** OpenEnv-compatible RL environment (FastAPI backend, Docker)
174
+ - **Training:** Hugging Face TRL with GRPO on **Qwen 1.5B**
175
+ - **Frontend:** React live dashboard
176
+ - **Difficulty levels:** easy, medium, hard, expert (with deadlines, dependency chains, mid-episode interruptions)
177
 
178
+ ### Actions
179
 
180
+ | Action | Description |
181
+ |---|---|
182
+ | `work` | Work on a task at normal pace |
183
+ | `focus` | Deep-work mode: 2× progress, 2× energy cost |
184
+ | `break` | Rest: +energy, −stress |
185
+ | `switch` | Change active task (small penalty) |
186
+ | `delay` | Wait one step |
187
 
188
+ ### Scoring Formula
189
 
190
  ```
191
+ score = completion×0.60 + deadline×0.22 + energy×0.10 + dependency×0.05 + interruption×0.03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192
  ```
193
 
194
+ Score is always in (0.01, 0.99).
195
 
196
+ ### Quick Setup
197
 
 
198
  ```bash
199
+ # Docker
200
+ docker build -t clm-env . && docker run -p 7860:7860 clm-env
 
201
 
202
+ # Local
 
203
  pip install -r requirements.txt
204
  uvicorn server.app:app --port 7860 --reload
 
205
 
206
+ # React dashboard
 
 
 
 
 
 
 
 
 
207
  cd frontend && npm install && npm run dev
 
208
  ```
209
 
210
+ ### Environment Variables
 
211
 
212
  | Variable | Description |
213
+ |---|---|
214
+ | `API_BASE_URL` | LLM API endpoint |
215
+ | `MODEL_NAME` | Model identifier |
216
  | `HF_TOKEN` | Hugging Face API token |
217
 
218
+ ### Project Structure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
219
 
220
+ ```
221
+ cognitive-load-manager/
222
+ ├── models.py ← Core environment
223
+ ├── inference.py ← Baseline LLM agent
224
+ ├── openenv.yaml ← OpenEnv spec
225
+ ├── backend/main.py ← FastAPI server
226
+ ├── grader/ ← Difficulty graders
227
+ └── frontend/ ← React dashboard
228
+ ```
229
 
230
+ For the full technical write-up — observation space, reward shaping table, training loop, and the v1→v2→v3 reward-tuning story — see [`blog.md`](./blog.md).
231
 
232
+ </details>
 
 
 
 
 
233
 
234
  ---
235
 
236
+ <p align="center">
237
+ <em>Built for the OpenEnv Hackathon, April 2026.</em><br/>
238
+ <strong>🧠 Scheduling that respects the humans doing the work.</strong>
239
+ </p>