garvitsachdeva commited on
Commit
984aa3b
·
1 Parent(s): 13517a8

docs: polish README; remove emoji

Browse files

- Remove frontmatter emoji and tighten intro/overview wording
- Minor formatting cleanup
- Add beginner-facing PROJECT_COMPLETE_GUIDE.md

Files changed (2) hide show
  1. PROJECT_COMPLETE_GUIDE.md +346 -0
  2. README.md +17 -24
PROJECT_COMPLETE_GUIDE.md ADDED
@@ -0,0 +1,346 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 911 Dispatch Project - Complete Beginner Guide
2
+
3
+ ## 1. What this project is (in plain language)
4
+
5
+ This project is a simulator where an AI agent learns to behave like a city emergency dispatch supervisor.
6
+
7
+ Think of it like a strategy game:
8
+ - There are emergencies (incidents).
9
+ - There are responders (fire, police, EMS units).
10
+ - The agent must choose what to do each turn (dispatch, reassign, cancel, request mutual aid, etc.).
11
+ - The simulator gives a score for each decision and a final score for the whole run.
12
+
13
+ The goal is to train and evaluate decision-making quality under pressure.
14
+
15
+ ## 2. What an RL environment means
16
+
17
+ RL means Reinforcement Learning.
18
+
19
+ In RL, four core ideas exist:
20
+ - Agent: the decision-maker (your model or baseline policy).
21
+ - Environment: the world that reacts to actions (this simulator).
22
+ - Reward: a number that says how good/bad the last action outcome was.
23
+ - Episode: one complete run from start to finish.
24
+
25
+ For this project:
26
+ - Agent picks an action.
27
+ - Environment updates city state.
28
+ - Environment returns:
29
+ - updated observation,
30
+ - reward,
31
+ - done flag (whether run is over).
32
+
33
+ That loop repeats until the episode ends.
34
+
35
+ ## 3. Important clarification: "scheme of electricity" vs "city schema"
36
+
37
+ There is no electricity scheme in this codebase.
38
+
39
+ What exists is a city schema.
40
+
41
+ City schema means a configuration blueprint for the simulation:
42
+ - city size (grid),
43
+ - districts,
44
+ - available units,
45
+ - unit speeds,
46
+ - default recommended unit types for each incident type.
47
+
48
+ The schema is loaded from data files and used to initialize deterministic, repeatable scenarios.
49
+
50
+ ## 4. Project architecture (high level)
51
+
52
+ 1. Scenario/task setup
53
+ - A task fixture builds initial units/incidents and metadata.
54
+
55
+ 2. State machine update engine
56
+ - Validates actions.
57
+ - Applies action effects.
58
+ - Advances time by one tick.
59
+ - Updates incident statuses and unit statuses.
60
+
61
+ 3. Reward + scoring
62
+ - Computes per-step reward components.
63
+ - Computes episode-level score using task-specific graders.
64
+
65
+ 4. API server
66
+ - Exposes reset/step/state endpoints.
67
+
68
+ 5. Dashboard
69
+ - Polls backend state repeatedly and renders units/incidents + reward bars.
70
+
71
+ ## 5. What is the task?
72
+
73
+ A task is a scenario type with its own initial conditions, difficulty, and final grading logic.
74
+
75
+ This project has 4 tasks:
76
+
77
+ 1. single_incident (easy)
78
+ - One incident, small unit pool.
79
+ - Focus: dispatch the right unit fast.
80
+
81
+ 2. multi_incident (medium)
82
+ - Multiple incidents at the same time.
83
+ - Focus: triage/prioritization and handling P1 incidents.
84
+
85
+ 3. mass_casualty (hard)
86
+ - Incident waves with severe emergencies and resource conflicts.
87
+ - Focus: survival outcomes under surge.
88
+
89
+ 4. shift_surge (hard)
90
+ - New incidents arrive over time and some units go out of service.
91
+ - Focus: long-horizon operations and city coverage under degradation.
92
+
93
+ ## 6. What is an episode?
94
+
95
+ An episode is one full run of a task from reset until terminal condition.
96
+
97
+ Episode starts when reset is called.
98
+ - step_count starts at 0.
99
+ - city_time starts at 0 seconds.
100
+ - units and incidents are loaded from selected task fixture.
101
+
102
+ Episode ends when any terminal condition is hit:
103
+ - max steps reached,
104
+ - at least one incident escalates,
105
+ - all incidents resolved.
106
+
107
+ ## 7. What is a step?
108
+
109
+ A step is one action cycle:
110
+
111
+ 1. Agent sends one action.
112
+ 2. Validator checks if action is legal.
113
+ 3. State machine applies action effects.
114
+ 4. Time advances by 30 seconds.
115
+ 5. Reward is computed.
116
+ 6. Observation + reward + done are returned.
117
+
118
+ Important:
119
+ - step_count increases by 1 per step.
120
+ - city_time increases by 30 seconds per step.
121
+
122
+ ## 8. At what step are we right now?
123
+
124
+ Snapshot from the live backend at the time this guide was generated:
125
+
126
+ - task_id: multi_incident
127
+ - episode_id: d2cd525e-2596-44cb-bbe3-af33236264a0
128
+ - step_count: 8
129
+ - city_time: 240.0 seconds
130
+ - cumulative_reward: 1.6
131
+ - episode_score: 0.0
132
+ - legal_actions currently available: 36
133
+
134
+ This is a live value, not a constant. If you reset again, step_count returns to 0.
135
+
136
+ ## 9. Action space (what actions exist)
137
+
138
+ Current action types include:
139
+ - DISPATCH
140
+ - CANCEL
141
+ - REASSIGN
142
+ - STAGE
143
+ - MUTUAL_AID
144
+ - UPGRADE
145
+ - DOWNGRADE
146
+
147
+ Legal actions are generated from current state and filtered by protocol validation, so only valid actions appear in legal_actions.
148
+
149
+ ## 10. How scoring works (complete detail)
150
+
151
+ There are two scoring layers:
152
+
153
+ 1. Step reward (every action)
154
+ 2. Episode score (whole run)
155
+
156
+ ### 10.1 Step reward (RewardCalculator)
157
+
158
+ Step reward uses a weighted sum of 5 components:
159
+ - response_time: 30%
160
+ - triage: 25%
161
+ - survival: 25%
162
+ - coverage: 12%
163
+ - protocol: 8%
164
+
165
+ Total formula:
166
+ - total = 0.30 * response_time + 0.25 * triage + 0.25 * survival + 0.12 * coverage + 0.08 * protocol
167
+ - result is clamped to [0, 1]
168
+
169
+ Safety rule:
170
+ - If any Priority-1 incident existed and survival component is 0, total score is capped at 0.2.
171
+
172
+ Component details:
173
+
174
+ 1. response_time
175
+ - Only meaningful for DISPATCH.
176
+ - For non-DISPATCH actions it returns neutral 0.5.
177
+ - For DISPATCH: compares ETA to severity benchmark.
178
+
179
+ 2. triage
180
+ - Only meaningful for DISPATCH.
181
+ - Checks if dispatched unit type matches required unit types for incident type.
182
+ - Handles enum-qualified metadata keys safely.
183
+
184
+ 3. survival
185
+ - Based on P1 incidents seen vs resolved without failure.
186
+ - Uses metadata lists: p1_seen, resolved_incidents, failed_incidents.
187
+
188
+ 4. coverage
189
+ - Measures how many districts still have AVAILABLE coverage.
190
+
191
+ 5. protocol
192
+ - If action invalid: 0.0.
193
+ - If valid and no phraseology text in Action.notes: neutral 0.5.
194
+ - If Action.notes provided: uses PhraseologyJudge score + readback correctness.
195
+
196
+ ### 10.2 Episode score (whole run)
197
+
198
+ Episode score is task-specific via a central grade_episode router.
199
+
200
+ Why this matters:
201
+ - Different tasks need different definitions of success.
202
+ - Mean step reward alone is often too weak for real evaluation.
203
+
204
+ Task-specific episode graders:
205
+
206
+ 1. single_incident
207
+ - +0.50 if incident resolved
208
+ - +0.30 if MEDIC dispatched correctly
209
+ - +0.20 if resolved within first 10 steps
210
+
211
+ 2. multi_incident
212
+ - Uses P1 resolution, overall resolution ratio, and escalation penalty
213
+ - score = 0.5 * p1_score + 0.3 * resolution_score - 0.2 * failure_penalty
214
+
215
+ 3. mass_casualty
216
+ - Emphasizes P1 survival with penalties for failures
217
+ - score = 0.6 * survival_score + 0.3 * mean_reward - failure_penalty
218
+
219
+ 4. shift_surge (improved)
220
+ - Emphasizes long-horizon operational quality:
221
+ - incident throughput (resolved ratio)
222
+ - P1 survival
223
+ - coverage
224
+ - low backlog
225
+ - mean reward
226
+ - escalation penalty
227
+
228
+ ## 11. Very important score semantics
229
+
230
+ In the OpenEnv wrapper:
231
+ - reward return value from step is per-step reward.
232
+ - observation.score is overwritten to episode score.
233
+
234
+ Also stored in metadata:
235
+ - cumulative_reward: running sum of step rewards.
236
+ - episode_rewards: list of per-step rewards.
237
+ - episode_score: current episode-level grade.
238
+
239
+ So if you compare values:
240
+ - reward = immediate local quality for this action
241
+ - observation.score = global task progress quality for the run
242
+
243
+ ## 12. Is the dashboard connected to backend or just static?
244
+
245
+ It is connected to backend.
246
+
247
+ How we know:
248
+ - The dashboard JavaScript calls API endpoint http://localhost:8000/dashboard/state.
249
+ - It polls every 500 ms.
250
+ - It renders live units/incidents, step, and reward breakdown from backend response.
251
+
252
+ Connection behavior:
253
+ - If backend is unreachable, dashboard shows disconnected status.
254
+ - If backend is running and reset was called, dashboard updates live as step changes.
255
+
256
+ ## 13. Why we used Docker
257
+
258
+ Docker is used to package the app and dependencies so it runs consistently everywhere.
259
+
260
+ Benefits:
261
+ - Same runtime on your machine, CI, and deployment platforms.
262
+ - No "works on my machine" package mismatch issues.
263
+ - Easy deployment with a single container image.
264
+ - Port compatibility: server reads PORT environment variable (important for hosted platforms).
265
+
266
+ In this project:
267
+ - Root Dockerfile runs uvicorn on 0.0.0.0 and PORT (default 8000).
268
+ - That makes it suitable for local run and hosted environments.
269
+
270
+ ## 14. What API key are we using?
271
+
272
+ The project expects environment variables. Keys are not hardcoded in repository files.
273
+
274
+ Required for LLM mode:
275
+ - API_BASE_URL
276
+ - MODEL_NAME
277
+ - OPENAI_API_KEY
278
+
279
+ Compatibility fallback:
280
+ - HF_TOKEN is accepted if OPENAI_API_KEY is not set.
281
+
282
+ No-key mode:
283
+ - USE_RANDOM=true bypasses LLM and uses a deterministic random baseline agent.
284
+
285
+ Practical meaning:
286
+ - If USE_RANDOM=true, you can run without any API key.
287
+ - If USE_RANDOM is not true, OPENAI_API_KEY (or HF_TOKEN fallback) is needed.
288
+
289
+ ## 15. Backend API endpoints (what each does)
290
+
291
+ - GET /health
292
+ - health check
293
+
294
+ - GET /tasks
295
+ - list available tasks
296
+
297
+ - POST /reset
298
+ - start new episode for selected task
299
+
300
+ - POST /step
301
+ - apply one action and move simulation one step
302
+
303
+ - GET /state
304
+ - current state
305
+
306
+ - GET /dashboard/state
307
+ - extended state for HTML dashboard (includes legal actions + last observation)
308
+
309
+ - GET /metadata and GET /schema
310
+ - environment metadata and contracts
311
+
312
+ - POST /mcp
313
+ - minimal JSON-RPC endpoint
314
+
315
+ ## 16. What the dashboard shows vs what it does not show
316
+
317
+ Shows:
318
+ - Unit cards (status, assignment, ETA, location)
319
+ - Incident cards (type, severity, status, assigned units)
320
+ - Map view for units/incidents
321
+ - Last step reward component bars
322
+ - Header task/episode/step values
323
+
324
+ Nuance:
325
+ - Header "Score" currently uses metadata.cumulative_reward.
326
+ - Episode score is available too (metadata.episode_score), but not currently shown as the main header score.
327
+
328
+ ## 17. Beginner glossary
329
+
330
+ - incident: emergency case to be handled
331
+ - unit: responder vehicle/team (EMS, fire, police, etc.)
332
+ - legal action: an action that passes protocol checks in current state
333
+ - reward: immediate feedback signal for one step
334
+ - episode score: overall quality of a full run
335
+ - terminal: episode is finished
336
+
337
+ ## 18. Practical "how to think" summary
338
+
339
+ When you judge behavior quality in this project:
340
+ - Use step rewards to understand local tactical quality.
341
+ - Use episode score to understand mission success for the selected task.
342
+ - Use dashboard to observe live state transitions.
343
+ - Use task definitions to interpret what success means in each scenario.
344
+
345
+ If you remember one thing:
346
+ - This is not a generic chatbot app. It is a decision simulator where actions change a world state over time and are graded both step-by-step and across full episodes.
README.md CHANGED
@@ -1,35 +1,31 @@
1
  ---
2
  title: 911 Dispatch Supervisor
3
- emoji: 🚨
4
  colorFrom: red
5
  colorTo: orange
6
  sdk: docker
7
  pinned: false
8
  tags:
9
-
10
  - openenv
11
  - reinforcement-learning
12
  - llm-agent
13
  - emergency-dispatch
14
  ---
15
 
16
- # 911 City-Wide Emergency Dispatch Supervisor
17
-
18
- **LLM-powered 911 dispatch supervision — city scale**
19
 
20
- A unified RL training environment for city-wide emergency dispatch operations. The agent supervises police, fire, and EMS unit allocation across simultaneous incidents under a deterministic simulation.
21
 
22
  ## Overview
23
 
24
- This project implements a benchmark environment for training and evaluating LLM agents as emergency dispatch supervisors. It features:
25
 
26
- - **Dispatch lifecycle**: incidents advance from pending to resolved (or escalated)
27
- - **Deterministic simulation**: Reproducible episodes under fixed seeds
28
- - **Protocol validator**: Checks if actions are legal in the current state
29
- - **OpenEnv compatible**: Standard RL environment interface
30
- - **Read-only 2D visualization**: Synchronized unit/incident visualization (see below)
31
 
32
- ## Visualizer (Judges: please check this)
33
 
34
  The 2D visualizer is in `src/visualizer/viewer.py` and renders the current state to a PNG.
35
 
@@ -41,10 +37,10 @@ from src.openenv_environment import OpenEnvEnvironment
41
  from src.visualizer.viewer import Viewer2D
42
 
43
  async def main():
44
- env = OpenEnvEnvironment(task_id="multi_incident", seed=42)
45
- await env.reset()
46
- Viewer2D().render_to_file("frame.png", env.state())
47
- env.close()
48
 
49
  asyncio.run(main())
50
  ```
@@ -194,8 +190,7 @@ The reward signal is a weighted combination of five components:
194
  | `coverage` | 12% | Geographic distribution of available units across city districts |
195
  | `protocol` | 8% | Action legality + dispatch phraseology/readback quality (via `Action.notes`) |
196
 
197
- **Safety gate:** If any Priority-1 incident was seen and `survival=0.0`, the total episode score is capped at `0.2` regardless of other components.
198
-
199
 
200
 
201
  ## Project Structure
@@ -265,13 +260,11 @@ curl -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d
265
  | `/dashboard/state` | GET | Extended state for `live_dashboard.html` |
266
  | `/tasks` | GET | List all available tasks with metadata |
267
 
268
- ## HF Space
269
-
270
- ### Deploying to Hugging Face Spaces (Docker)
271
 
272
- This repository is compatible with **Docker Spaces** (the README frontmatter includes `sdk: docker` and the Space tags include `openenv`).
273
 
274
- 1) Create a new Space choose **Docker**.
275
  2) Push this repository to the Space.
276
  3) The server binds to the `PORT` environment variable (HF commonly sets `PORT=7860`).
277
 
 
1
  ---
2
  title: 911 Dispatch Supervisor
 
3
  colorFrom: red
4
  colorTo: orange
5
  sdk: docker
6
  pinned: false
7
  tags:
 
8
  - openenv
9
  - reinforcement-learning
10
  - llm-agent
11
  - emergency-dispatch
12
  ---
13
 
14
+ # 911 Dispatch Supervisor
 
 
15
 
16
+ Deterministic simulator + RL-style environment for city-wide 911 dispatch. It supports police/fire/EMS unit allocation across concurrent incidents, with an OpenEnv-compatible interface and a small FastAPI server for interactive runs and the live dashboard.
17
 
18
  ## Overview
19
 
20
+ This repo is meant for training and evaluating agents (LLM-based or scripted baselines) as dispatch supervisors. It includes:
21
 
22
+ - **Dispatch lifecycle**: incidents progress from pending to resolved (or escalated)
23
+ - **Deterministic simulation**: reproducible episodes under fixed seeds
24
+ - **Protocol validator**: checks whether an action is legal in the current state
25
+ - **OpenEnv-compatible**: standard `reset` / `step` loop
26
+ - **2D visualization**: render a PNG snapshot of the current state
27
 
28
+ ## Visualizer
29
 
30
  The 2D visualizer is in `src/visualizer/viewer.py` and renders the current state to a PNG.
31
 
 
37
  from src.visualizer.viewer import Viewer2D
38
 
39
  async def main():
40
+ env = OpenEnvEnvironment(task_id="multi_incident", seed=42)
41
+ await env.reset()
42
+ Viewer2D().render_to_file("frame.png", env.state())
43
+ env.close()
44
 
45
  asyncio.run(main())
46
  ```
 
190
  | `coverage` | 12% | Geographic distribution of available units across city districts |
191
  | `protocol` | 8% | Action legality + dispatch phraseology/readback quality (via `Action.notes`) |
192
 
193
+ Safety gate: if any Priority-1 incident was seen and `survival=0.0`, the total episode score is capped at `0.2` regardless of other components.
 
194
 
195
 
196
  ## Project Structure
 
260
  | `/dashboard/state` | GET | Extended state for `live_dashboard.html` |
261
  | `/tasks` | GET | List all available tasks with metadata |
262
 
263
+ ## Hugging Face Spaces
 
 
264
 
265
+ ### Deploying to Spaces (Docker)
266
 
267
+ 1) Create a new Space and choose **Docker**.
268
  2) Push this repository to the Space.
269
  3) The server binds to the `PORT` environment variable (HF commonly sets `PORT=7860`).
270