Spaces:
Sleeping
Sleeping
| # Architecture β 911 Dispatch Supervisor | |
| ## Layer Overview | |
| OpenEnvEnvironment β public API (reset/step/state/legal_actions) | |
| β | |
| DispatchStateMachine β simulation engine | |
| βββ DispatchProtocolValidator β action legality (15+ rules) | |
| βββ RewardCalculator β 5-component weighted reward | |
| βββ DispatchScenarioFactory β deterministic task fixtures | |
| β | |
| Task-Specific Graders β episode-level scoring | |
| ## Key Design Decisions | |
| ### Why Manhattan Distance Physics | |
| Real city blocks use Manhattan (rectilinear) distance for navigation. | |
| Euclidean distance would underestimate travel time by ~27% on average, | |
| making ETAs unrealistically optimistic. Manhattan physics produce ETAs | |
| that match real CAD system calculations. | |
| ### Why Legal Actions Are Pre-filtered | |
| Rather than letting agents propose arbitrary actions and penalizing illegal | |
| ones, the environment exposes only currently-valid actions via `legal_actions()`. | |
| This eliminates wasted LLM budget on invalid action generation and focuses | |
| evaluation on dispatch decision quality, not action syntax compliance. | |
| ### Why the Safety Gate Uses 0.2 Not 0.0 | |
| A hard zero for any P1 failure would make the reward surface completely flat | |
| for bad agents β no gradient to learn from. Capping at 0.2 preserves partial | |
| signal (coverage, response time on other incidents) while making P1 failure | |
| unambiguously catastrophic. Real dispatch accountability works the same way: | |
| an incident review happens, but other good work is still acknowledged. | |
| ### Why Phraseology Is Scored | |
| Real dispatchers are evaluated on radio communication clarity. An agent that | |
| dispatches the right unit but says nothing (or says the wrong thing) is less | |
| useful as a CAD copilot than one that also generates the correct radio traffic. | |
| Phraseology scoring creates incentive for agents to learn domain language, not | |
| just resource allocation. | |
| ### Why Waves Spawn at Fixed Steps Not Random Times | |
| Reproducibility is a first-class requirement. Fixed step offsets guarantee | |
| identical episode structure across all runs, making score comparisons valid. | |
| The challenge comes from the agent not knowing wave timing in advance β it | |
| must react, not plan. | |
| ## State Machine Transitions | |
| Unit: AVAILABLE β DISPATCHED β ON_SCENE β AVAILABLE | |
| βοΈ STAGED βοΈ | |
| Incident: PENDING β RESPONDING β ON_SCENE β RESOLVED | |
| βοΈ ESCALATED (survival clock expired) | |
| ## File Map | |
| | File | Responsibility | | |
| |---|---| | |
| | `src/models.py` | All Pydantic data contracts | | |
| | `src/state_machine.py` | Core simulation engine | | |
| | `src/protocol.py` | Action legality validation | | |
| | `src/rewards.py` | Reward calculation | | |
| | `src/physics.py` | Manhattan distance, ETA, coverage | | |
| | `src/phraseology.py` | Radio language scoring | | |
| | `src/city_schema.py` | City topology loader | | |
| | `src/tasks/registry.py` | Task definitions and fixtures | | |
| | `src/openenv_environment.py` | OpenEnv API wrapper | | |
| | `server/app.py` | FastAPI server | | |