AE-Shree commited on
Commit ·
b8f6679
1
Parent(s): ff660f5
Final Commit
Browse files
README.md
CHANGED
|
@@ -9,325 +9,231 @@ pinned: false
|
|
| 9 |
tags: [openenv, rl, scheduling, agent-eval, productivity, multi-agent, grpo, reinforcement-learning]
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# 🧠 Cognitive Load Manager
|
| 13 |
|
| 14 |
-
**
|
| 15 |
|
| 16 |
-
[](#)
|
| 20 |
-
[](#)
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
-
## 🎥 See It
|
| 25 |
|
| 26 |
| | |
|
| 27 |
|---|---|
|
| 28 |
-
| **
|
| 29 |
-
| **
|
| 30 |
-
| **Training notebook (Colab — re-runnable)** | 👉 [Open in Colab](https://colab.research.google.com/drive/1_OoW4iH1acCni0H9POCcX2pp-6bOorzo?usp=sharing) |
|
| 31 |
|
| 32 |
---
|
| 33 |
|
| 34 |
-
## The Problem
|
| 35 |
|
| 36 |
-
|
|
|
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
## What We Built
|
| 45 |
-
|
| 46 |
-
CLM is a **multi-agent reinforcement learning environment** built on the OpenEnv interface. It simulates a real knowledge-work day — tasks of different types, deadlines with real consequences, worker states that shift throughout the episode, and mid-session surprises that force the agent to adapt.
|
| 47 |
-
|
| 48 |
-
The setup:
|
| 49 |
|
| 50 |
-
-
|
| 51 |
-
- **One manager agent** — the AI being trained — that observes the full workspace state and makes scheduling decisions every step
|
| 52 |
-
- **A task pool** with deadlines, dependency chains, and varying complexity levels (email, code review, reports, meetings, calls)
|
| 53 |
|
| 54 |
-
|
| 55 |
|
| 56 |
-
|
| 57 |
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
|
| 62 |
-
|
| 63 |
|
| 64 |
---
|
| 65 |
|
| 66 |
-
##
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
-
- **
|
| 71 |
-
- **
|
| 72 |
-
- **
|
| 73 |
-
- **Dense reward signal** across the full trajectory, not just terminal rewards
|
| 74 |
|
| 75 |
-
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
| Action | Description | Cost |
|
| 80 |
-
|--------|-------------|------|
|
| 81 |
-
| `work` | Work on `task_id` at normal pace | Energy ↓ by task type |
|
| 82 |
-
| `focus` | Deep-work mode on `task_id`: 2× progress, 2× energy cost | Energy ↓ 2× |
|
| 83 |
-
| `break` | Rest: Energy +0.22, Stress −0.18 | None |
|
| 84 |
-
| `switch` | Change active task | Small reward penalty |
|
| 85 |
-
| `delay` | Wait one step; slight stress relief | None |
|
| 86 |
-
|
| 87 |
-
Action format:
|
| 88 |
-
```json
|
| 89 |
-
{"type": "work", "task_id": "m1"}
|
| 90 |
-
{"type": "focus", "task_id": "h3"}
|
| 91 |
-
{"type": "break", "task_id": null}
|
| 92 |
-
```
|
| 93 |
|
|
|
|
| 94 |
|
|
|
|
| 95 |
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
"
|
| 114 |
-
"
|
| 115 |
-
"
|
| 116 |
-
"
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
```
|
| 123 |
|
| 124 |
-
**
|
| 125 |
-
- `blocked_tasks` — tasks whose `depends_on` parent is not yet complete; agent must not work on these
|
| 126 |
-
- `upcoming_deadlines` — tasks with deadline within the next 5 steps
|
| 127 |
-
- `focus_mode` — whether the agent is currently in deep-work state
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
## 📋 Tasks & Baseline Scores
|
| 132 |
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
| **expert** | 10 mixed types | Yes (very tight) | 5 dependency chains | 3 mid-episode | **0.221** |
|
| 139 |
|
| 140 |
-
|
| 141 |
-
A strong LLM agent should achieve: easy >0.85, medium >0.55, hard >0.35, expert >0.25.
|
| 142 |
|
|
|
|
| 143 |
|
| 144 |
-
##
|
| 145 |
|
| 146 |
-
|
| 147 |
-
score = weighted_completion × 0.60
|
| 148 |
-
+ deadline_adherence × 0.22
|
| 149 |
-
+ energy_efficiency × 0.10
|
| 150 |
-
+ dependency_bonus × 0.05
|
| 151 |
-
+ interruption_bonus × 0.03
|
| 152 |
-
```
|
| 153 |
|
| 154 |
-
|
|
| 155 |
|---|---|---|
|
| 156 |
-
|
|
| 157 |
-
|
|
| 158 |
-
|
|
| 159 |
-
| Dependency Bonus | ×0.05 | Reward for respecting task dependency order |
|
| 160 |
-
| Interruption Bonus | ×0.03 | Reward for minimizing context-switching interruptions |
|
| 161 |
|
| 162 |
-
|
| 163 |
|
| 164 |
-
|
|
|
|
|
|
|
| 165 |
|
| 166 |
-
|
| 167 |
-
## 📊 Reward Shaping Details
|
| 168 |
-
|
| 169 |
-
Step rewards provide **dense signal** across the full trajectory:
|
| 170 |
-
|
| 171 |
-
| Event | Reward |
|
| 172 |
-
|-------|--------|
|
| 173 |
-
| Task progress (normal) | +0.10 × progress_delta × priority_weight |
|
| 174 |
-
| Milestone 25% | +0.04 × priority_weight |
|
| 175 |
-
| Milestone 50% | +0.07 × priority_weight |
|
| 176 |
-
| Milestone 75% | +0.09 × priority_weight |
|
| 177 |
-
| Task complete 100% | +0.18 × priority_weight |
|
| 178 |
-
| Context switch | −0.07 |
|
| 179 |
-
| Work on blocked task | −0.15 |
|
| 180 |
-
| Interruption arrives | −0.05 |
|
| 181 |
-
| Episode: burnout | −1.0 |
|
| 182 |
-
| Episode: all done (on time) | +1.0 |
|
| 183 |
-
| Episode: all done (late) | +0.5 |
|
| 184 |
-
|
| 185 |
-
Early versions of the reward function only rewarded task completion — and the agent learned to grind workers into the ground to hit numbers. Three full rebuilds later, the current structure produces measurably better behavior.
|
| 186 |
|
| 187 |
---
|
| 188 |
|
| 189 |
-
##
|
| 190 |
|
| 191 |
-
|
| 192 |
|
| 193 |
-
The
|
| 194 |
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
1. The model (manager agent) receives an observation from the environment
|
| 200 |
-
2. It generates an action — structured as a decision over the available action space
|
| 201 |
-
3. The action executes in the environment; a reward is returned
|
| 202 |
-
4. GRPO updates the model based on relative reward signal across a batch of rollouts
|
| 203 |
-
|
| 204 |
-
We ran for 1000 steps in the primary training run. The mean reward curve shows the agent moving from near-random behavior in the early steps to a clear upward trend by step 250, stabilizing at a higher plateau through steps 750–1000.
|
| 205 |
|
| 206 |
---
|
| 207 |
|
| 208 |
-
##
|
| 209 |
-
|
| 210 |
-
**Before vs After GRPO** — measured during 1000-step fine-tuning on the CLM environment:
|
| 211 |
|
| 212 |
-
| |
|
| 213 |
-
|---|---|
|
| 214 |
-
|
|
| 215 |
-
|
| 216 |
-
|
| 217 |
|
| 218 |
-
|
| 219 |
-
|---|---|---|
|
| 220 |
-
| Focus | 0.249 | Highest — agent learned to protect deep work blocks |
|
| 221 |
-
| Work | Improved significantly | Better task-worker matching |
|
| 222 |
-
| Break | 0.040 | Positive — agent learned breaks aren't wasted time |
|
| 223 |
-
| Delay | 0.019 | Low but selective — used strategically, not as default |
|
| 224 |
|
| 225 |
-
|
|
|
|
| 226 |
|
| 227 |
-
|
| 228 |
|
| 229 |
-
|
|
|
|
|
|
|
|
|
|
| 230 |
|
| 231 |
-
|
| 232 |
|
| 233 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 234 |
|
| 235 |
-
##
|
| 236 |
|
| 237 |
```
|
| 238 |
-
|
| 239 |
-
├── models.py ← Core environment logic (tasks, state, grader, dynamics)
|
| 240 |
-
├── inference.py ← OpenAI-client baseline agent (all 4 difficulty levels)
|
| 241 |
-
├── openenv.yaml ← OpenEnv spec (actions, observations, tasks, scoring)
|
| 242 |
-
├── Dockerfile ← Container definition
|
| 243 |
-
├── backend/
|
| 244 |
-
│ └── main.py ← FastAPI app (OpenEnv HTTP server + grade endpoints)
|
| 245 |
-
├── server/
|
| 246 |
-
│ └── app.py ← Uvicorn entrypoint
|
| 247 |
-
├── grader/
|
| 248 |
-
│ └── clm_graders.py ← EasyGrader, MediumGrader, HardGrader, ExpertGrader
|
| 249 |
-
└── frontend/ ← React live dashboard (visual state inspector)
|
| 250 |
-
```
|
| 251 |
-
|
| 252 |
-
```mermaid
|
| 253 |
-
graph TD
|
| 254 |
-
Agent[LLM Agent<br/>inference.py] -->|POST /step| API[FastAPI Backend<br/>backend/main.py]
|
| 255 |
-
API --> Core[models.py<br/>CLMEnvironment]
|
| 256 |
-
Core --> Grader[grader/clm_graders.py]
|
| 257 |
-
Dashboard[React Dashboard<br/>frontend/] -->|GET /state| API
|
| 258 |
-
API -->|OpenEnv spec| OE[openenv validate]
|
| 259 |
```
|
| 260 |
|
| 261 |
-
|
| 262 |
|
| 263 |
-
##
|
| 264 |
|
| 265 |
-
### Docker (for HF Space / production)
|
| 266 |
```bash
|
| 267 |
-
|
| 268 |
-
docker run -p 7860:7860 clm-env
|
| 269 |
-
```
|
| 270 |
|
| 271 |
-
#
|
| 272 |
-
```bash
|
| 273 |
pip install -r requirements.txt
|
| 274 |
uvicorn server.app:app --port 7860 --reload
|
| 275 |
-
```
|
| 276 |
|
| 277 |
-
#
|
| 278 |
-
```bash
|
| 279 |
-
export HF_TOKEN="hf_your_token_here"
|
| 280 |
-
export API_BASE_URL="https://router.huggingface.co/v1"
|
| 281 |
-
export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
|
| 282 |
-
python inference.py
|
| 283 |
-
```
|
| 284 |
-
|
| 285 |
-
### Optional: React Dashboard
|
| 286 |
-
```bash
|
| 287 |
cd frontend && npm install && npm run dev
|
| 288 |
-
# Visit http://localhost:5173
|
| 289 |
```
|
| 290 |
|
| 291 |
-
|
| 292 |
-
## ⚙️ Environment Variables
|
| 293 |
|
| 294 |
| Variable | Description |
|
| 295 |
-
|---
|
| 296 |
-
| `API_BASE_URL` | LLM API endpoint
|
| 297 |
-
| `MODEL_NAME` | Model identifier
|
| 298 |
| `HF_TOKEN` | Hugging Face API token |
|
| 299 |
|
| 300 |
-
|
| 301 |
-
|
| 302 |
-
## 🔭 Where This Goes
|
| 303 |
-
|
| 304 |
-
This started as a hackathon project. The problem it's solving isn't going away.
|
| 305 |
-
|
| 306 |
-
Near-term: developer-facing APIs that let teams plug human-aware scheduling into tools they already use — Slack, Linear, Notion. Not replacing them. Adding a layer that actually understands worker state.
|
| 307 |
-
|
| 308 |
-
Longer out: the same environment architecture adapts to other domains where human capacity matters. An adaptive learning system that knows when a student is cognitively overloaded, not just academically behind. A clinical scheduling tool that models physician fatigue before it compounds into errors.
|
| 309 |
-
|
| 310 |
-
The environment is the foundation. What you train on it is what changes.
|
| 311 |
-
|
| 312 |
-
---
|
| 313 |
-
|
| 314 |
-
## 🪞 Honest Reflection
|
| 315 |
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
| 320 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 321 |
|
| 322 |
-
|
| 323 |
|
| 324 |
-
|
| 325 |
-
|---|---|
|
| 326 |
-
| 🤗 HF Space (live environment) | Linked above (this Space) |
|
| 327 |
-
| 📓 Training Notebook (Colab) | [Open in Colab](https://colab.research.google.com/drive/1_OoW4iH1acCni0H9POCcX2pp-6bOorzo?usp=sharing) |
|
| 328 |
-
| 🎥 Dashboard Demo (full video) | [Google Drive](https://drive.google.com/file/d/149dz_1rIlXv-eR1fwYaxRJ-cV0mQNevJ/view?usp=sharing) |
|
| 329 |
-
| 🎬 Project Walkthrough (Loom) | [Loom](https://www.loom.com/share/7c7293efa0ba459ba2de243b0b5aacb2) |
|
| 330 |
|
| 331 |
---
|
| 332 |
|
| 333 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
tags: [openenv, rl, scheduling, agent-eval, productivity, multi-agent, grpo, reinforcement-learning]
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# 🧠 Cognitive Load Manager
|
| 13 |
|
| 14 |
+
> **An AI that schedules work like a *good manager* — one that actually cares if you're tired.**
|
| 15 |
|
| 16 |
+
[](#)
|
| 17 |
+
[](#)
|
| 18 |
+
[](#)
|
|
|
|
|
|
|
| 19 |
|
| 20 |
---
|
| 21 |
|
| 22 |
+
## 🎥 See It In 2 Minutes
|
| 23 |
|
| 24 |
| | |
|
| 25 |
|---|---|
|
| 26 |
+
| 🎬 **Project walkthrough** | 👉 [Watch on Loom](https://www.loom.com/share/7c7293efa0ba459ba2de243b0b5aacb2) |
|
| 27 |
+
| 📊 **Live dashboard demo** | 👉 [Watch the demo](https://drive.google.com/file/d/149dz_1rIlXv-eR1fwYaxRJ-cV0mQNevJ/view?usp=sharing) |
|
|
|
|
| 28 |
|
| 29 |
---
|
| 30 |
|
| 31 |
+
## 🤔 The Problem
|
| 32 |
|
| 33 |
+
Most productivity tools tell you **what** to do.
|
| 34 |
+
None of them care **how you're feeling** while doing it.
|
| 35 |
|
| 36 |
+
- Running on 4 hours of sleep? Doesn't matter.
|
| 37 |
+
- Just finished three back-to-back meetings? Doesn't matter.
|
| 38 |
+
- Operating at 40% because the last task drained you? Doesn't matter.
|
| 39 |
|
| 40 |
+
Real performance isn't a straight line. Fatigue piles up. Stress carries over. Switching between tasks costs you more than you think.
|
| 41 |
|
| 42 |
+
**We built an AI that learns to notice all of that — and schedule around it.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
---
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
## ✨ What Makes It Special
|
| 47 |
|
| 48 |
+
This is the moment that made the whole project worth it:
|
| 49 |
|
| 50 |
+
> **The AI started giving workers breaks *before* they burned out — not after.**
|
| 51 |
+
>
|
| 52 |
+
> Nobody told it to do that. It figured it out on its own.
|
| 53 |
|
| 54 |
+
That's the difference between a scheduler that optimizes hours and a manager that actually understands people.
|
| 55 |
|
| 56 |
---
|
| 57 |
|
| 58 |
+
## 🛠️ How It Works (In Plain English)
|
| 59 |
|
| 60 |
+
Imagine a simulated office with:
|
| 61 |
|
| 62 |
+
- 👥 **Three workers** — each with their own energy, stress, and fatigue
|
| 63 |
+
- 🧑💼 **One manager (the AI)** — deciding who does what, and when to call a break
|
| 64 |
+
- 📋 **A pile of tasks** — emails, code reviews, reports, meetings, with real deadlines
|
|
|
|
| 65 |
|
| 66 |
+
The AI plays the manager role. Push too hard, workers burn out and quality crashes. Push too soft, deadlines slip. The AI has to find the sweet spot — and keep finding it as the day changes.
|
| 67 |
|
| 68 |
+
And the day **does** change. Mid-shift, a "Production server down!" alert can fire and suddenly every code review is critical. The AI has to adapt on the fly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
+
---
|
| 71 |
|
| 72 |
+
## 🗺️ How The Pieces Fit Together
|
| 73 |
|
| 74 |
+
```mermaid
|
| 75 |
+
flowchart TB
|
| 76 |
+
AI["🧠 <b>AI Manager</b><br/><i>Qwen 1.5B</i><br/>decides who does what"]
|
| 77 |
+
|
| 78 |
+
subgraph SIM["🏢 Simulated Workday"]
|
| 79 |
+
direction LR
|
| 80 |
+
W1["👤 <b>Worker 1</b><br/>energy · stress · fatigue"]
|
| 81 |
+
W2["👤 <b>Worker 2</b><br/>energy · stress · fatigue"]
|
| 82 |
+
W3["👤 <b>Worker 3</b><br/>energy · stress · fatigue"]
|
| 83 |
+
TP["📋 <b>Task Pool</b><br/>emails · reviews<br/>reports · meetings"]
|
| 84 |
+
EV["⚡ <b>Live Events</b><br/>deadline shifts<br/>urgent interrupts"]
|
| 85 |
+
end
|
| 86 |
+
|
| 87 |
+
DASH["📊 <b>Live Dashboard</b><br/>watch it think<br/>in real time"]
|
| 88 |
+
|
| 89 |
+
TR["🎯 <b>GRPO Training</b><br/><i>Hugging Face TRL</i><br/>1000 steps · +163% lift"]
|
| 90 |
+
|
| 91 |
+
AI -- "assigns · focuses<br/>breaks · delays" --> SIM
|
| 92 |
+
SIM -- "observation +<br/>reward signal" --> AI
|
| 93 |
+
SIM -- "live state" --> DASH
|
| 94 |
+
AI -. "rollouts" .-> TR
|
| 95 |
+
TR -. "smarter weights" .-> AI
|
| 96 |
+
|
| 97 |
+
classDef ai fill:#9b87f5,stroke:#5b3fc4,stroke-width:3px,color:#fff
|
| 98 |
+
classDef worker fill:#dbeafe,stroke:#3b82f6,stroke-width:2px,color:#1e3a8a
|
| 99 |
+
classDef task fill:#fce7f3,stroke:#ec4899,stroke-width:2px,color:#831843
|
| 100 |
+
classDef event fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
|
| 101 |
+
classDef train fill:#d1fae5,stroke:#10b981,stroke-width:2px,color:#064e3b
|
| 102 |
+
classDef dash fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81
|
| 103 |
+
classDef sim fill:#fef9c3,stroke:#eab308,stroke-width:2px,color:#713f12
|
| 104 |
+
|
| 105 |
+
class AI ai
|
| 106 |
+
class W1,W2,W3 worker
|
| 107 |
+
class TP task
|
| 108 |
+
class EV event
|
| 109 |
+
class TR train
|
| 110 |
+
class DASH dash
|
| 111 |
+
class SIM sim
|
| 112 |
```
|
| 113 |
|
| 114 |
+
**The loop in plain English:**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
+
1. 🧠 **The AI looks** at the workday — who's tired, what's due, what just blew up.
|
| 117 |
+
2. 🎯 **It makes a call** — assign, focus, break, switch, or wait.
|
| 118 |
+
3. 🏢 **The simulated office reacts** — workers gain progress or burn out, deadlines pass.
|
| 119 |
+
4. ↩️ **A reward comes back** — high if the call was wise, low if it wasn't.
|
| 120 |
+
5. 🔁 **GRPO uses those rewards** to nudge the AI toward better decisions next time.
|
|
|
|
| 121 |
|
| 122 |
+
After 1000 loops, the AI is **5× better than random guessing**.
|
|
|
|
| 123 |
|
| 124 |
+
---
|
| 125 |
|
| 126 |
+
## 📈 The Results
|
| 127 |
|
| 128 |
+
After training the AI for 1000 steps:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
|
| 130 |
+
| | Score | What it means |
|
| 131 |
|---|---|---|
|
| 132 |
+
| 🎲 Random guessing | ~0.05 | Total chaos |
|
| 133 |
+
| 🤖 Untrained AI | 0.101 | Mediocre |
|
| 134 |
+
| ✅ **Our trained AI** | **0.265** | **5× better than random — +163% lift** |
|
|
|
|
|
|
|
| 135 |
|
| 136 |
+
What it learned without being told:
|
| 137 |
|
| 138 |
+
- ⏸️ Insert breaks *before* burnout, not after
|
| 139 |
+
- 🎯 Protect deep-focus time — don't yank workers off mid-task
|
| 140 |
+
- 🚨 Adapt instantly when priorities flip mid-day
|
| 141 |
|
| 142 |
+
👉 [Watch the full dashboard demo](https://drive.google.com/file/d/149dz_1rIlXv-eR1fwYaxRJ-cV0mQNevJ/view?usp=sharing)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
|
| 144 |
---
|
| 145 |
|
| 146 |
+
## 🔭 Why This Matters
|
| 147 |
|
| 148 |
+
Today, AI tools schedule meetings and triage tickets — but they treat people like robots. CLM is a step toward AI that schedules **for humans, not over them**.
|
| 149 |
|
| 150 |
+
The same idea plugs into:
|
| 151 |
|
| 152 |
+
- 📅 **Work tools** — Slack, Linear, Notion that understand worker capacity
|
| 153 |
+
- 🎓 **Education** — tutors that notice when a student is overloaded, not just behind
|
| 154 |
+
- 🏥 **Healthcare** — staff schedulers that catch fatigue before it becomes errors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
|
| 156 |
---
|
| 157 |
|
| 158 |
+
## 🚀 Try It
|
|
|
|
|
|
|
| 159 |
|
| 160 |
+
| | |
|
| 161 |
+
|---|---|
|
| 162 |
+
| 📓 **Re-run our training in your browser** | 👉 [Open in Colab](https://colab.research.google.com/drive/1_OoW4iH1acCni0H9POCcX2pp-6bOorzo?usp=sharing) |
|
| 163 |
+
| 🤗 **Live environment** | This Hugging Face Space |
|
| 164 |
+
| 📝 **The full build story** | [`blog.md`](./blog.md) |
|
| 165 |
|
| 166 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 167 |
|
| 168 |
+
<details>
|
| 169 |
+
<summary><strong>🛠️ For Developers — Technical Details</strong></summary>
|
| 170 |
|
| 171 |
+
### Stack
|
| 172 |
|
| 173 |
+
- **Environment:** OpenEnv-compatible RL environment (FastAPI backend, Docker)
|
| 174 |
+
- **Training:** Hugging Face TRL with GRPO on **Qwen 1.5B**
|
| 175 |
+
- **Frontend:** React live dashboard
|
| 176 |
+
- **Difficulty levels:** easy, medium, hard, expert (with deadlines, dependency chains, mid-episode interruptions)
|
| 177 |
|
| 178 |
+
### Actions
|
| 179 |
|
| 180 |
+
| Action | Description |
|
| 181 |
+
|---|---|
|
| 182 |
+
| `work` | Work on a task at normal pace |
|
| 183 |
+
| `focus` | Deep-work mode: 2× progress, 2× energy cost |
|
| 184 |
+
| `break` | Rest: +energy, −stress |
|
| 185 |
+
| `switch` | Change active task (small penalty) |
|
| 186 |
+
| `delay` | Wait one step |
|
| 187 |
|
| 188 |
+
### Scoring Formula
|
| 189 |
|
| 190 |
```
|
| 191 |
+
score = completion×0.60 + deadline×0.22 + energy×0.10 + dependency×0.05 + interruption×0.03
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 192 |
```
|
| 193 |
|
| 194 |
+
Score is always in (0.01, 0.99).
|
| 195 |
|
| 196 |
+
### Quick Setup
|
| 197 |
|
|
|
|
| 198 |
```bash
|
| 199 |
+
# Docker
|
| 200 |
+
docker build -t clm-env . && docker run -p 7860:7860 clm-env
|
|
|
|
| 201 |
|
| 202 |
+
# Local
|
|
|
|
| 203 |
pip install -r requirements.txt
|
| 204 |
uvicorn server.app:app --port 7860 --reload
|
|
|
|
| 205 |
|
| 206 |
+
# React dashboard
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 207 |
cd frontend && npm install && npm run dev
|
|
|
|
| 208 |
```
|
| 209 |
|
| 210 |
+
### Environment Variables
|
|
|
|
| 211 |
|
| 212 |
| Variable | Description |
|
| 213 |
+
|---|---|
|
| 214 |
+
| `API_BASE_URL` | LLM API endpoint |
|
| 215 |
+
| `MODEL_NAME` | Model identifier |
|
| 216 |
| `HF_TOKEN` | Hugging Face API token |
|
| 217 |
|
| 218 |
+
### Project Structure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 219 |
|
| 220 |
+
```
|
| 221 |
+
cognitive-load-manager/
|
| 222 |
+
├── models.py ← Core environment
|
| 223 |
+
├── inference.py ← Baseline LLM agent
|
| 224 |
+
├── openenv.yaml ← OpenEnv spec
|
| 225 |
+
├── backend/main.py ← FastAPI server
|
| 226 |
+
├── grader/ ← Difficulty graders
|
| 227 |
+
└── frontend/ ← React dashboard
|
| 228 |
+
```
|
| 229 |
|
| 230 |
+
For the full technical write-up — observation space, reward shaping table, training loop, and the v1→v2→v3 reward-tuning story — see [`blog.md`](./blog.md).
|
| 231 |
|
| 232 |
+
</details>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 233 |
|
| 234 |
---
|
| 235 |
|
| 236 |
+
<p align="center">
|
| 237 |
+
<em>Built for the OpenEnv Hackathon, April 2026.</em><br/>
|
| 238 |
+
<strong>🧠 Scheduling that respects the humans doing the work.</strong>
|
| 239 |
+
</p>
|