Spaces:

garvitsachdeva
/

911

Sleeping

Commit

984aa3b

1 Parent(s): 13517a8

docs: polish README; remove emoji

- Remove frontmatter emoji and tighten intro/overview wording
- Minor formatting cleanup
- Add beginner-facing PROJECT_COMPLETE_GUIDE.md

Files changed (2) hide show

PROJECT_COMPLETE_GUIDE.md +346 -0
README.md +17 -24

PROJECT_COMPLETE_GUIDE.md ADDED Viewed

	@@ -0,0 +1,346 @@

+# 911 Dispatch Project - Complete Beginner Guide
+## 1. What this project is (in plain language)
+This project is a simulator where an AI agent learns to behave like a city emergency dispatch supervisor.
+Think of it like a strategy game:
+- There are emergencies (incidents).
+- There are responders (fire, police, EMS units).
+- The agent must choose what to do each turn (dispatch, reassign, cancel, request mutual aid, etc.).
+- The simulator gives a score for each decision and a final score for the whole run.
+The goal is to train and evaluate decision-making quality under pressure.
+## 2. What an RL environment means
+RL means Reinforcement Learning.
+In RL, four core ideas exist:
+- Agent: the decision-maker (your model or baseline policy).
+- Environment: the world that reacts to actions (this simulator).
+- Reward: a number that says how good/bad the last action outcome was.
+- Episode: one complete run from start to finish.
+For this project:
+- Agent picks an action.
+- Environment updates city state.
+- Environment returns:
+  - updated observation,
+  - reward,
+  - done flag (whether run is over).
+That loop repeats until the episode ends.
+## 3. Important clarification: "scheme of electricity" vs "city schema"
+There is no electricity scheme in this codebase.
+What exists is a city schema.
+City schema means a configuration blueprint for the simulation:
+- city size (grid),
+- districts,
+- available units,
+- unit speeds,
+- default recommended unit types for each incident type.
+The schema is loaded from data files and used to initialize deterministic, repeatable scenarios.
+## 4. Project architecture (high level)
+1. Scenario/task setup
+- A task fixture builds initial units/incidents and metadata.
+2. State machine update engine
+- Validates actions.
+- Applies action effects.
+- Advances time by one tick.
+- Updates incident statuses and unit statuses.
+3. Reward + scoring
+- Computes per-step reward components.
+- Computes episode-level score using task-specific graders.
+4. API server
+- Exposes reset/step/state endpoints.
+5. Dashboard
+- Polls backend state repeatedly and renders units/incidents + reward bars.
+## 5. What is the task?
+A task is a scenario type with its own initial conditions, difficulty, and final grading logic.
+This project has 4 tasks:
+1. single_incident (easy)
+- One incident, small unit pool.
+- Focus: dispatch the right unit fast.
+2. multi_incident (medium)
+- Multiple incidents at the same time.
+- Focus: triage/prioritization and handling P1 incidents.
+3. mass_casualty (hard)
+- Incident waves with severe emergencies and resource conflicts.
+- Focus: survival outcomes under surge.
+4. shift_surge (hard)
+- New incidents arrive over time and some units go out of service.
+- Focus: long-horizon operations and city coverage under degradation.
+## 6. What is an episode?
+An episode is one full run of a task from reset until terminal condition.
+Episode starts when reset is called.
+- step_count starts at 0.
+- city_time starts at 0 seconds.
+- units and incidents are loaded from selected task fixture.
+Episode ends when any terminal condition is hit:
+- max steps reached,
+- at least one incident escalates,
+- all incidents resolved.
+## 7. What is a step?
+A step is one action cycle:
+1. Agent sends one action.
+2. Validator checks if action is legal.
+3. State machine applies action effects.
+4. Time advances by 30 seconds.
+5. Reward is computed.
+6. Observation + reward + done are returned.
+Important:
+- step_count increases by 1 per step.
+- city_time increases by 30 seconds per step.
+## 8. At what step are we right now?
+Snapshot from the live backend at the time this guide was generated:
+- task_id: multi_incident
+- episode_id: d2cd525e-2596-44cb-bbe3-af33236264a0
+- step_count: 8
+- city_time: 240.0 seconds
+- cumulative_reward: 1.6
+- episode_score: 0.0
+- legal_actions currently available: 36
+This is a live value, not a constant. If you reset again, step_count returns to 0.
+## 9. Action space (what actions exist)
+Current action types include:
+- DISPATCH
+- CANCEL
+- REASSIGN
+- STAGE
+- MUTUAL_AID
+- UPGRADE
+- DOWNGRADE
+Legal actions are generated from current state and filtered by protocol validation, so only valid actions appear in legal_actions.
+## 10. How scoring works (complete detail)
+There are two scoring layers:
+1. Step reward (every action)
+2. Episode score (whole run)
+### 10.1 Step reward (RewardCalculator)
+Step reward uses a weighted sum of 5 components:
+- response_time: 30%
+- triage: 25%
+- survival: 25%
+- coverage: 12%
+- protocol: 8%
+Total formula:
+- total = 0.30 * response_time + 0.25 * triage + 0.25 * survival + 0.12 * coverage + 0.08 * protocol
+- result is clamped to [0, 1]
+Safety rule:
+- If any Priority-1 incident existed and survival component is 0, total score is capped at 0.2.
+Component details:
+1. response_time
+- Only meaningful for DISPATCH.
+- For non-DISPATCH actions it returns neutral 0.5.
+- For DISPATCH: compares ETA to severity benchmark.
+2. triage
+- Only meaningful for DISPATCH.
+- Checks if dispatched unit type matches required unit types for incident type.
+- Handles enum-qualified metadata keys safely.
+3. survival
+- Based on P1 incidents seen vs resolved without failure.
+- Uses metadata lists: p1_seen, resolved_incidents, failed_incidents.
+4. coverage
+- Measures how many districts still have AVAILABLE coverage.
+5. protocol
+- If action invalid: 0.0.
+- If valid and no phraseology text in Action.notes: neutral 0.5.
+- If Action.notes provided: uses PhraseologyJudge score + readback correctness.
+### 10.2 Episode score (whole run)
+Episode score is task-specific via a central grade_episode router.
+Why this matters:
+- Different tasks need different definitions of success.
+- Mean step reward alone is often too weak for real evaluation.
+Task-specific episode graders:
+1. single_incident
+- +0.50 if incident resolved
+- +0.30 if MEDIC dispatched correctly
+- +0.20 if resolved within first 10 steps
+2. multi_incident
+- Uses P1 resolution, overall resolution ratio, and escalation penalty
+- score = 0.5 * p1_score + 0.3 * resolution_score - 0.2 * failure_penalty
+3. mass_casualty
+- Emphasizes P1 survival with penalties for failures
+- score = 0.6 * survival_score + 0.3 * mean_reward - failure_penalty
+4. shift_surge (improved)
+- Emphasizes long-horizon operational quality:
+  - incident throughput (resolved ratio)
+  - P1 survival
+  - coverage
+  - low backlog
+  - mean reward
+  - escalation penalty
+## 11. Very important score semantics
+In the OpenEnv wrapper:
+- reward return value from step is per-step reward.
+- observation.score is overwritten to episode score.
+Also stored in metadata:
+- cumulative_reward: running sum of step rewards.
+- episode_rewards: list of per-step rewards.
+- episode_score: current episode-level grade.
+So if you compare values:
+- reward = immediate local quality for this action
+- observation.score = global task progress quality for the run
+## 12. Is the dashboard connected to backend or just static?
+It is connected to backend.
+How we know:
+- The dashboard JavaScript calls API endpoint http://localhost:8000/dashboard/state.
+- It polls every 500 ms.
+- It renders live units/incidents, step, and reward breakdown from backend response.
+Connection behavior:
+- If backend is unreachable, dashboard shows disconnected status.
+- If backend is running and reset was called, dashboard updates live as step changes.
+## 13. Why we used Docker
+Docker is used to package the app and dependencies so it runs consistently everywhere.
+Benefits:
+- Same runtime on your machine, CI, and deployment platforms.
+- No "works on my machine" package mismatch issues.
+- Easy deployment with a single container image.
+- Port compatibility: server reads PORT environment variable (important for hosted platforms).
+In this project:
+- Root Dockerfile runs uvicorn on 0.0.0.0 and PORT (default 8000).
+- That makes it suitable for local run and hosted environments.
+## 14. What API key are we using?
+The project expects environment variables. Keys are not hardcoded in repository files.
+Required for LLM mode:
+- API_BASE_URL
+- MODEL_NAME
+- OPENAI_API_KEY
+Compatibility fallback:
+- HF_TOKEN is accepted if OPENAI_API_KEY is not set.
+No-key mode:
+- USE_RANDOM=true bypasses LLM and uses a deterministic random baseline agent.
+Practical meaning:
+- If USE_RANDOM=true, you can run without any API key.
+- If USE_RANDOM is not true, OPENAI_API_KEY (or HF_TOKEN fallback) is needed.
+## 15. Backend API endpoints (what each does)
+- GET /health
+  - health check
+- GET /tasks
+  - list available tasks
+- POST /reset
+  - start new episode for selected task
+- POST /step
+  - apply one action and move simulation one step
+- GET /state
+  - current state
+- GET /dashboard/state
+  - extended state for HTML dashboard (includes legal actions + last observation)
+- GET /metadata and GET /schema
+  - environment metadata and contracts
+- POST /mcp
+  - minimal JSON-RPC endpoint
+## 16. What the dashboard shows vs what it does not show
+Shows:
+- Unit cards (status, assignment, ETA, location)
+- Incident cards (type, severity, status, assigned units)
+- Map view for units/incidents
+- Last step reward component bars
+- Header task/episode/step values
+Nuance:
+- Header "Score" currently uses metadata.cumulative_reward.
+- Episode score is available too (metadata.episode_score), but not currently shown as the main header score.
+## 17. Beginner glossary
+- incident: emergency case to be handled
+- unit: responder vehicle/team (EMS, fire, police, etc.)
+- legal action: an action that passes protocol checks in current state
+- reward: immediate feedback signal for one step
+- episode score: overall quality of a full run
+- terminal: episode is finished
+## 18. Practical "how to think" summary
+When you judge behavior quality in this project:
+- Use step rewards to understand local tactical quality.
+- Use episode score to understand mission success for the selected task.
+- Use dashboard to observe live state transitions.
+- Use task definitions to interpret what success means in each scenario.
+If you remember one thing:
+- This is not a generic chatbot app. It is a decision simulator where actions change a world state over time and are graded both step-by-step and across full episodes.

README.md CHANGED Viewed

@@ -1,35 +1,31 @@
 ---
 title: 911 Dispatch Supervisor
-emoji: 🚨
 colorFrom: red
 colorTo: orange
 sdk: docker
 pinned: false
 tags:
   - openenv
   - reinforcement-learning
   - llm-agent
   - emergency-dispatch
 ---
-# 911 City-Wide Emergency Dispatch Supervisor
-**LLM-powered 911 dispatch supervision — city scale**
-A unified RL training environment for city-wide emergency dispatch operations. The agent supervises police, fire, and EMS unit allocation across simultaneous incidents under a deterministic simulation.
 ## Overview
-This project implements a benchmark environment for training and evaluating LLM agents as emergency dispatch supervisors. It features:
-- **Dispatch lifecycle**: incidents advance from pending to resolved (or escalated)
-- **Deterministic simulation**: Reproducible episodes under fixed seeds
-- **Protocol validator**: Checks if actions are legal in the current state
-- **OpenEnv compatible**: Standard RL environment interface
-- **Read-only 2D visualization**: Synchronized unit/incident visualization (see below)
-## Visualizer (Judges: please check this)
 The 2D visualizer is in `src/visualizer/viewer.py` and renders the current state to a PNG.
@@ -41,10 +37,10 @@ from src.openenv_environment import OpenEnvEnvironment
 from src.visualizer.viewer import Viewer2D
 async def main():
-  env = OpenEnvEnvironment(task_id="multi_incident", seed=42)
-  await env.reset()
-  Viewer2D().render_to_file("frame.png", env.state())
-  env.close()
 asyncio.run(main())
 ```
@@ -194,8 +190,7 @@ The reward signal is a weighted combination of five components:
 | `coverage` | 12% | Geographic distribution of available units across city districts |
 | `protocol` | 8% | Action legality + dispatch phraseology/readback quality (via `Action.notes`) |
-**Safety gate:** If any Priority-1 incident was seen and `survival=0.0`, the total episode score is capped at `0.2` regardless of other components.
 ## Project Structure
@@ -265,13 +260,11 @@ curl -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d
 | `/dashboard/state` | GET | Extended state for `live_dashboard.html` |
 | `/tasks` | GET | List all available tasks with metadata |
-## HF Space
-### Deploying to Hugging Face Spaces (Docker)
-This repository is compatible with **Docker Spaces** (the README frontmatter includes `sdk: docker` and the Space tags include `openenv`).
-1) Create a new Space → choose **Docker**.
 2) Push this repository to the Space.
 3) The server binds to the `PORT` environment variable (HF commonly sets `PORT=7860`).

 ---
 title: 911 Dispatch Supervisor
 colorFrom: red
 colorTo: orange
 sdk: docker
 pinned: false
 tags:
   - openenv
   - reinforcement-learning
   - llm-agent
   - emergency-dispatch
 ---
+# 911 Dispatch Supervisor
+Deterministic simulator + RL-style environment for city-wide 911 dispatch. It supports police/fire/EMS unit allocation across concurrent incidents, with an OpenEnv-compatible interface and a small FastAPI server for interactive runs and the live dashboard.
 ## Overview
+This repo is meant for training and evaluating agents (LLM-based or scripted baselines) as dispatch supervisors. It includes:
+- **Dispatch lifecycle**: incidents progress from pending to resolved (or escalated)
+- **Deterministic simulation**: reproducible episodes under fixed seeds
+- **Protocol validator**: checks whether an action is legal in the current state
+- **OpenEnv-compatible**: standard `reset` / `step` loop
+- **2D visualization**: render a PNG snapshot of the current state
+## Visualizer
 The 2D visualizer is in `src/visualizer/viewer.py` and renders the current state to a PNG.
 from src.visualizer.viewer import Viewer2D
 async def main():
+    env = OpenEnvEnvironment(task_id="multi_incident", seed=42)
+    await env.reset()
+    Viewer2D().render_to_file("frame.png", env.state())
+    env.close()
 asyncio.run(main())
 ```
 | `coverage` | 12% | Geographic distribution of available units across city districts |
 | `protocol` | 8% | Action legality + dispatch phraseology/readback quality (via `Action.notes`) |
+Safety gate: if any Priority-1 incident was seen and `survival=0.0`, the total episode score is capped at `0.2` regardless of other components.
 ## Project Structure
 | `/dashboard/state` | GET | Extended state for `live_dashboard.html` |
 | `/tasks` | GET | List all available tasks with metadata |
+## Hugging Face Spaces
+### Deploying to Spaces (Docker)
+1) Create a new Space and choose **Docker**.
 2) Push this repository to the Space.
 3) The server binds to the `PORT` environment variable (HF commonly sets `PORT=7860`).