Spaces:
Sleeping
Sleeping
File size: 23,642 Bytes
1fb8f9c e259b96 cb9e74c 4904e85 cb9e74c 4904e85 1d762f3 4904e85 1d762f3 4904e85 1d762f3 13517a8 1d762f3 13517a8 1d762f3 13517a8 1d762f3 13517a8 1d762f3 13517a8 cb9e74c 13517a8 cb9e74c 13517a8 4904e85 cb9e74c 14170d7 cb9e74c 1d762f3 cb9e74c 1d762f3 14170d7 cb9e74c 43f2683 4904e85 cb9e74c 1d762f3 cb9e74c 1d762f3 cb9e74c 4904e85 1d762f3 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 6172160 cb9e74c 6172160 1d762f3 cb9e74c 6172160 cb9e74c 14170d7 cb9e74c 14170d7 cb9e74c 6172160 cb9e74c 6172160 cb9e74c 6172160 cb9e74c 6172160 cb9e74c 6172160 cb9e74c 6172160 cb9e74c 6172160 cb9e74c 6172160 cb9e74c 6172160 cb9e74c 6172160 cb9e74c 4904e85 f4ed234 4904e85 cb9e74c f4ed234 4904e85 cb9e74c f4ed234 6172160 f4ed234 4904e85 f4ed234 4904e85 f4ed234 cb9e74c f4ed234 cb9e74c f4ed234 4904e85 cb9e74c f4ed234 cb9e74c 4904e85 cb9e74c 14170d7 cb9e74c 14170d7 cb9e74c 14170d7 cb9e74c f4ed234 cb9e74c f4ed234 cb9e74c f4ed234 cb9e74c f4ed234 1d762f3 f4ed234 14170d7 f4ed234 775befb f4ed234 775befb f4ed234 775befb cb9e74c e259b96 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c f4ed234 cb9e74c f4ed234 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 cb9e74c 4904e85 f4ed234 cb9e74c 4904e85 cb9e74c f4ed234 cb9e74c f4ed234 cb9e74c f4ed234 43f2683 cb9e74c 775befb 43f2683 cb9e74c 775befb cb9e74c 43f2683 cb9e74c 4904e85 1c662f4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 | ---
title: 911 Dispatch Supervisor
emoji: "π¨"
colorFrom: red
colorTo: gray
sdk: docker
app_port: 7860
tags:
- openenv
pinned: false
---
# π¨ 911 Dispatch Supervisor
> **A city-wide emergency dispatch RL environment** β train and evaluate LLM agents to manage simultaneous incidents by dispatching police, fire, and EMS units across a city grid under realistic resource constraints.
[](https://openenv.dev)
[](https://hub.docker.com)
[](https://huggingface.co/spaces)
[](LICENSE)
---
## Why This Matters
911 dispatch centers in the United States handle over 240 million calls per year. Every dispatcher decision β which unit to send, in what order, with what priority β directly determines survival outcomes. A 90-second delay in dispatching a MEDIC to a cardiac arrest drops survival probability by roughly 10%.
The **911 Dispatch Supervisor** is the first open RL benchmark for training and evaluating AI agents on emergency dispatch decisions. It models the exact tradeoffs real dispatchers face: triage under uncertainty, multi-unit resource allocation, geographic coverage, and protocol compliance β all simultaneously.
This fills a direct gap for researchers building AI copilots for public safety systems, and provides immediate evaluation value for any LLM claiming real-world decision-making capability.
## Overview
At every step, an LLM agent plays the role of a city-wide dispatch supervisor, deciding which units to dispatch, reassign, cancel, stage, or escalate β under time pressure, limited resources, and competing priorities across a 100Γ100 city grid.
This is not a toy environment. Emergency dispatch is a high-stakes, multi-objective decision problem that:
- Requires **triage** β prioritizing life-threatening incidents over property damage
- Demands **coverage awareness** β keeping geographic zones protected
- Rewards **correct unit-type matching** β sending a MEDIC vs. an ENGINE
- Punishes **delays** that cause Priority-1 incidents to escalate
- Scores **dispatch phraseology** β realistic radio communication language
---
## Environment Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OpenEnv Interface β
β reset() Β· step(action) Β· state() β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β DispatchStateMachine β
β β’ Validates actions via DispatchProtocolValidator β
β β’ Moves units toward incidents (Manhattan physics) β
β β’ Advances incident status: PENDING β RESPONDING β β
β ON_SCENE β RESOLVED (or ESCALATED if timeout) β
β β’ Spawns incident waves at configured step offsets β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β RewardCalculator β
β β’ response_time (30%) Β· triage (25%) Β· survival (25%) β
β β’ coverage (12%) Β· protocol (8%) β
β β’ Safety gate: P1 failure β score capped at 0.2 β
ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ
β Task-Specific Episode Graders β
β single_incident Β· multi_incident Β· mass_casualty Β· β
β shift_surge β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
---
## Action Space
Actions are structured Pydantic models β no free-text parsing required.
**`src.models.Action`**
| Field | Type | Description |
|---|---|---|
| `action_type` | `DispatchAction` | One of: `DISPATCH`, `CANCEL`, `REASSIGN`, `STAGE`, `MUTUAL_AID`, `UPGRADE`, `DOWNGRADE` |
| `unit_id` | `str` | Unit identifier, e.g. `MED-1`, `ENG-2` |
| `incident_id` | `str` | Incident identifier, e.g. `INC-001` |
| `notes` | `str \| None` | Optional phraseology text for protocol scoring bonus |
| `priority_override` | `IncidentSeverity \| None` | Required for `UPGRADE`/`DOWNGRADE` actions |
**Action Types**
| Action | Description | Protocol Rule |
|---|---|---|
| `DISPATCH` | Send an available unit to an incident | Unit must be `AVAILABLE`; incident must not be `RESOLVED` |
| `CANCEL` | Release a unit from its current assignment | Unit must be assigned to the specified incident |
| `REASSIGN` | Redirect an assigned unit to a different incident | Unit must be `DISPATCHED`, `ON_SCENE`, or `TRANSPORTING` |
| `STAGE` | Pre-position a unit near an incident without committing | Unit must be `AVAILABLE`; incident must be `PENDING` |
| `MUTUAL_AID` | Request external unit of a given type | Only allowed when all local units of that type are busy |
| `UPGRADE` | Increase incident severity | New severity must be strictly higher than current |
| `DOWNGRADE` | Decrease incident severity | New severity must be strictly lower than current |
#### Dispatch Phraseology (bonus scoring)
The `notes` field is scored for realistic radio communication language. Agents that use proper dispatch phraseology receive up to 8% bonus on their protocol score.
| Action | Example notes value |
|---|---|
| Dispatch MEDIC to cardiac | `"Medic 1 en route to cardiac arrest, Code 3, ETA 4 minutes"` |
| Dispatch ENGINE to fire | `"Engine 2 responding to structure fire, Code 3, all units advised"` |
| Mutual aid request | `"Requesting mutual aid, all local MEDICs committed, Priority 1 cardiac at grid 45-72"` |
| Stage unit | `"Engine 1 staging at District 3 perimeter, awaiting scene clear"` |
---
## Observation Space
**`src.models.Observation`**
| Field | Type | Description |
|---|---|---|
| `result` | `str` | Human-readable result of the last action |
| `score` | `float` | Episode score in `[0.0, 1.0]` (task-level grade) |
| `protocol_ok` | `bool` | Whether the action passed protocol validation |
| `issues` | `list[str]` | Warnings or error codes from the validator |
| `reward_breakdown` | `dict[str, float] \| None` | Per-component reward scores for dashboard display |
**Full State (`src.models.State`)**
| Field | Type | Description |
|---|---|---|
| `units` | `dict[str, UnitState]` | All units with type, status, location, ETA |
| `incidents` | `dict[str, IncidentState]` | All incidents with type, severity, status, assigned units |
| `episode_id` | `str` | Unique episode identifier |
| `step_count` | `int` | Current step number |
| `task_id` | `str` | Active task identifier |
| `city_time` | `float` | Simulated city clock in seconds (30s per step) |
| `metadata` | `dict` | Schema info, districts, seeds, wave configs, bookkeeping |
**Unit Status Transitions**
```
AVAILABLE β DISPATCHED β ON_SCENE β AVAILABLE
β
OUT_OF_SERVICE (shift_surge only)
```
**Incident Status Transitions**
```
PENDING β RESPONDING β ON_SCENE β RESOLVED
β β
ESCALATED ESCALATED (survival clock expires)
```
---
## Reward Function
The step-level reward is a weighted combination of five components:
| Component | Weight | Description |
|---|---|---|
| `response_time` | **30%** | How quickly dispatched units reach incidents relative to severity benchmarks (P1: 240s, P2: 480s, P3: 900s) |
| `triage` | **25%** | Whether the dispatched unit type matches incident requirements (e.g., MEDIC for CARDIAC_ARREST) |
| `survival` | **25%** | Fraction of Priority-1 incidents resolved before the survival clock expires |
| `coverage` | **12%** | Geographic distribution of available units across city districts |
| `protocol` | **8%** | Action legality + optional phraseology/readback quality via `Action.notes` |
> **β οΈ Safety Gate:** If any Priority-1 incident (cardiac arrest, shooting, building collapse) results in zero survival score, the entire episode reward is hard-capped at **0.2** regardless of other performance. This forces agents to treat life-threatening incidents as non-negotiable β exactly as real dispatch protocol requires.
**Non-DISPATCH actions** receive neutral `0.5` for `response_time` and `triage`, allowing agents to maintain coverage without penalty.
---
## Tasks
### Task Difficulty Overview
| Task | Difficulty | Max Steps | Key Challenge |
|---|---|---|---|
| `single_incident` | π’ Easy | 20 | Dispatch the right unit type quickly |
| `multi_incident` | π‘ Medium | 40 | Triage 3 simultaneous incidents, protect P1s |
| `mass_casualty` | π΄ Hard | 60 | Manage wave-based surge with limited resources |
| `shift_surge` | π΄ Hard | 60 | Adapt as units fail and incidents stream continuously |
---
### π’ Task 1: `single_incident` β Basic Dispatch (Easy)
**Scenario**: One active incident (`CARDIAC_ARREST`, Priority-1) in a small city. A MEDIC, ENGINE, and PATROL are all available.
**Objective**: Dispatch the correct unit type (MEDIC) to the incident as fast as possible.
**Grader Logic**:
```
score = 0.0
if incident RESOLVED: score += 0.50
if MEDIC dispatched correctly: score += 0.30
if resolved within 10 steps: score += 0.20
```
**Why it's easy**: One incident, one correct action, small state space.
**What a good agent does**: Immediately dispatches `MED-1 β INC-001`.
**Scoring:** 50% resolution + 30% correct unit type used + 20% response speed.
---
### π‘ Task 2: `multi_incident` β Simultaneous Triage (Medium)
**Scenario**: Three concurrent incidents at episode start β a structure fire (P2), a cardiac arrest (P1), and a shooting (P1) β with 6 available units.
**Objective**: Respond to all incidents with the right unit types, prioritizing P1s.
**Grader Logic**:
```
score = 0.5 Γ p1_resolution_rate
+ 0.3 Γ overall_resolution_rate
- 0.2 Γ escalation_penalty
```
**Why it's medium**: Multiple incidents compete for units; wrong type dispatch wastes coverage; P1s must be addressed before P2.
**What a good agent does**: Immediately dispatches MEDIC to cardiac arrest and patrol to shooting, then handles the fire with ENGINE/LADDER.
**Scoring:** 50% P1 resolution + 30% overall resolution β 20% escalation penalty.
---
### π΄ Task 3: `mass_casualty` β Wave-Based Surge (Hard)
**Scenario**: One critical incident (`BUILDING_COLLAPSE`, P1) at step 0. New waves arrive at steps 5 (structure fire) and 12 (two simultaneous cardiac arrests).
**Objective**: Maximize P1 survival across all waves despite resource conflicts.
**Grader Logic**:
```
score = 0.6 Γ p1_survival_rate
+ 0.3 Γ mean_step_reward
- failure_penalty
```
**Why it's hard**: Resources are exhausted when waves arrive. Agents must decide whether to reassign mid-scene or request mutual aid (at a 120s ETA penalty). Mutual aid is only legal when local units of the required type are fully committed.
**What a good agent does**: Dispatches immediately to initial collapse, stages additional units near expected wave arrival zones, requests mutual aid for later waves.
**Scoring:** 60% P1 survival + 30% mean step reward β failure penalty if building collapse unresponded.
---
### π΄ Task 4: `shift_surge` β Long-Horizon Degradation (Hard)
**Scenario**: 5 units start available, but 3 go `OUT_OF_SERVICE` by step 5. Incidents arrive in waves every 8 steps throughout the 60-step episode.
**Objective**: Maintain city-wide throughput and P1 survival despite progressive resource degradation.
**Grader Logic**:
```
score = 0.35 Γ resolution_ratio
+ 0.25 Γ p1_survival
+ 0.15 Γ coverage
+ 0.15 Γ (1 - backlog_ratio)
+ 0.10 Γ mean_reward
- 0.25 Γ escalation_ratio
```
**Why it's hard**: No single optimal strategy β agents must continuously rebalance between throughput and coverage as available resources shrink and incident demand grows.
**Scoring:** 35% resolution + 25% P1 survival + 15% coverage + 15% backlog management + 10% step reward β 25% escalation penalty.
---
## Unit Types
| Unit | Code | Speed | Primary Use |
|---|---|---|---|
| Engine | `ENGINE` | 0.8 bl/s | Structure fires, hazmat support |
| Ladder | `LADDER` | 0.6 bl/s | Multi-story fires, rescues |
| Medic | `MEDIC` | 1.0 bl/s | Medical emergencies, trauma |
| Patrol | `PATROL` | 1.2 bl/s | Shootings, MVAs, crowd control |
| Hazmat | `HAZMAT` | 0.5 bl/s | Chemical/biological spills |
## Incident Types
| Incident | Recommended Units | Default Severity |
|---|---|---|
| `CARDIAC_ARREST` | MEDIC | P1 |
| `STRUCTURE_FIRE` | ENGINE Γ 2, LADDER | P2 |
| `SHOOTING` | MEDIC, PATROL Γ 2 | P1 |
| `MULTI_VEHICLE_ACCIDENT` | MEDIC, PATROL | P2 |
| `BUILDING_COLLAPSE` | ENGINE, LADDER, MEDIC Γ 2 | P1 |
| `HAZMAT_SPILL` | HAZMAT, ENGINE | P2 |
| `OVERDOSE` | MEDIC | P2 |
| `MISSING_PERSON` | PATROL | P3 |
---
## OpenEnv Interface
```python
import asyncio
from src.openenv_environment import OpenEnvEnvironment
from src.models import Action, DispatchAction
async def main():
env = OpenEnvEnvironment(task_id="multi_incident", seed=42)
# Reset to initial state
obs = await env.reset()
print(obs.result) # "dispatch center online"
# Get legal actions (protocol-validated)
legal = env.legal_actions()
# Take a step
action = legal[0]
obs, reward, done = await env.step(action)
print(f"reward={reward:.3f}, done={done}, protocol_ok={obs.protocol_ok}")
# Inspect full state
state = env.state()
print(f"step={state.step_count}, city_time={state.city_time}s")
asyncio.run(main())
```
---
## API Endpoints
| Endpoint | Method | Description |
|---|---|---|
| `/health` | GET | Health check β returns `{"status": "healthy"}` |
| `/reset` | POST | Reset environment; body: `{"task_id": "...", "seed": 42}` (both optional) |
| `/step` | POST | Execute an action; body: `{"action": {...}}` |
| `/state` | GET | Current full environment state |
| `/tasks` | GET | List all available tasks with metadata |
| `/dashboard/state` | GET | Extended state for live HTML dashboard |
| `/schema` | GET | JSON schemas for Action, Observation, State |
| `/metadata` | GET | Environment name, version, description |
---
## Quick Start
```bash
# Install dependencies
pip install -r requirements.txt
# Run the demo (non-interactive, no LLM required)
python demo.py
# Start the API server
python -m src.server.app
# Run random agent baseline (no API key required)
USE_RANDOM=true python inference.py
# Run LLM agent
API_BASE_URL=https://router.huggingface.co/v1 \
MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct \
HF_TOKEN=your_token \
python inference.py
# Run full test suite
pytest tests/ -v
```
---
## Docker
### Build & Run
```bash
# Build image
docker build -t citywide-dispatch-supervisor .
# Run on port 7860 (required for HF Spaces)
docker run -p 7860:7860 citywide-dispatch-supervisor
# Health check
curl http://localhost:7860/health
# Reset to a specific task
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "multi_incident", "seed": 42}'
```
---
## Hugging Face Spaces Deployment
This repository is deployed as a Docker-based HF Space.
1. Create a new HF Space β select **Docker**
2. Push this repository to the Space
3. The server reads `PORT` from the environment (HF sets `PORT=7860`)
4. Once running, the following endpoints are publicly available:
- `GET /health`
- `POST /reset`
- `POST /step`
- `GET /state`
Validate your deployment with the prevalidation script:
```bash
bash samplematerial/prevalidation.sh https://your-space.hf.space .
```
---
## Environment Variables
| Variable | Description | Default |
|---|---|---|
| `API_BASE_URL` | LLM API endpoint | `https://router.huggingface.co/v1` |
| `MODEL_NAME` | Model identifier | `meta-llama/Llama-3.1-8B-Instruct` |
| `HF_TOKEN` | HuggingFace API key | β |
| `USE_RANDOM` | Set `true` for deterministic random baseline | `false` |
| `PORT` | Server port | `7860` |
---
## Baseline Scores
Scores normalized to `[0.0, 1.0]` using `sum(rewards) / max_steps`.
Run with `USE_RANDOM=true python inference.py` (seed=42, fully deterministic).
| Task | Difficulty | Max Steps | Random Agent Score |
|---|---|---|---|
| `single_incident` | Easy | 20 | 0.2000 |
| `multi_incident` | Medium | 40 | 0.3117 |
| `mass_casualty` | Hard | 60 | 0.4645 |
| `shift_surge` | Hard | 60 | 0.3183 |
> **Note:** Earlier README versions showed higher scores (~0.30β0.74) from a different scoring path (`observation.score`). These figures use the canonical competition normalization: `sum(step_rewards) / max_steps`, clamped to `[0.0, 1.0]`.
### What the scores mean
A random agent scoring **0.20 on the easiest task** confirms the environment is not trivially solvable β there is no reward for random dispatching. The gradient from 0.20 β 0.46 across tasks reflects genuine increasing complexity, not just more steps.
A well-prompted frontier LLM (GPT-4o, Llama-3.1-70B) is expected to score **0.55β0.75 on single_incident** and **0.30β0.45 on shift_surge**, demonstrating the environment meaningfully differentiates agent capability.
LLM agents (`meta-llama/Llama-3.1-8B-Instruct` via `https://router.huggingface.co/v1`) are expected to score meaningfully higher on easy and medium tasks by correctly prioritizing P1 incidents and matching unit types.
Run the baseline matrix (random + LLM reruns) and emit a JSON report:
```bash
API_BASE_URL=https://router.huggingface.co/v1 \
MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct \
HF_TOKEN=your_token \
python scripts/run_baseline_matrix.py --random-runs 1 --llm-runs 3 --output-json baseline_report.json
```
Windows PowerShell shortcut:
```powershell
$env:HF_TOKEN="your_token"
powershell -ExecutionPolicy Bypass -File scripts/run_nemotron_baseline.ps1 -RandomRuns 1 -LlmRuns 3
```
---
## Project Structure
```
.
βββ src/
β βββ models.py # Pydantic typed contracts (Action, Observation, State)
β βββ protocol.py # Dispatch protocol validator
β βββ physics.py # City-grid movement / ETA helpers
β βββ city_schema.py # City topology + unit configuration loader
β βββ state_machine.py # Core dispatch state machine
β βββ rewards.py # Reward engine + episode graders
β βββ phraseology.py # Dispatch phraseology renderer/judge
β βββ api.py # REST API client wrapper
β βββ grading.py # Centralized episode grading router
β βββ benchmark.py # Benchmark runner (list/run all tasks)
β βββ openenv_environment.py # OpenEnv-compatible environment wrapper
β βββ tasks/
β β βββ registry.py # Task registry + deterministic scenario fixtures
β β βββ single_incident.py # Easy task + grader
β β βββ multi_incident.py # Medium task + grader
β β βββ mass_casualty.py # Hard task + grader
β β βββ shift_surge.py # Hard task + grader
β βββ server/
β β βββ app.py # FastAPI server (reset/step/state endpoints)
β β βββ requirements.txt
β β βββ Dockerfile
β βββ visualizer/
β βββ viewer.py # Read-only 2D Matplotlib visualizer
βββ data/
β βββ metro_city.json # Large city schema (default)
β βββ city_small.json # Small city schema (testing)
βββ tests/ # TDD test suite (~20 test modules)
βββ samplematerial/
β βββ prevalidation.sh # HF Space + Docker validation script
βββ demo.py # Non-interactive demo (no LLM required)
βββ inference.py # Competition inference script
βββ live_dashboard.html # Browser-based live dashboard
βββ validate_local.py # Local pre-submission validation
βββ openenv.yaml # OpenEnv specification
βββ pyproject.toml # uv project config
βββ requirements.txt # pip dependencies
βββ Dockerfile # Root Docker build
```
---
## Live Dashboard
After starting the server and calling `/reset`, open `live_dashboard.html` in a browser:
```bash
# Terminal 1: start server
python -m src.server.app
# Terminal 2: reset to a task
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_id": "multi_incident"}'
# Browser: open live_dashboard.html
```
The dashboard polls `/dashboard/state` every 500ms and renders:
- Unit cards (status, ETA, assignment, location)
- Incident cards (type, severity, status, assigned units)
- City map (2D grid with unit and incident markers)
- Per-step reward component bars
---
## 2D Visualizer (Programmatic)
```python
import asyncio
from src.openenv_environment import OpenEnvEnvironment
from src.visualizer.viewer import Viewer2D
async def main():
env = OpenEnvEnvironment(task_id="multi_incident", seed=42)
await env.reset()
Viewer2D().render_to_file("frame.png", env.state())
env.close()
asyncio.run(main())
```
---
---
## Determinism
All scenarios are deterministic under a fixed seed:
```python
env1 = OpenEnvEnvironment(task_id="shift_surge", seed=42)
env2 = OpenEnvEnvironment(task_id="shift_surge", seed=42)
# env1 and env2 produce identical episodes
```
Incident positions include small seeded perturbations for realism; the overall episode structure (waves, unit positions, incident types) is fully reproducible.
---
## Running Tests
```bash
# Full test suite
pytest tests/ -v
# Individual modules
pytest tests/test_state_machine.py -v
pytest tests/test_rewards.py -v
pytest tests/test_openenv_integration.py -v
pytest tests/test_inference.py -v
```
---
## Pre-Submission Validation
```bash
# Full local validation (tests + inference + docker + benchmark scores)
python validate_local.py
# OpenEnv spec validation
openenv validate
# HF Space validation (requires deployed space)
bash samplematerial/prevalidation.sh https://your-space.hf.space .
# Windows (explicit Git Bash)
"C:/Program Files/Git/bin/bash.exe" samplematerial/prevalidation.sh https://your-space.hf.space .
```
---
## License
MIT License
|