911 / docs /architecture.md
Abhinav31122006
feat: exploit analysis, architecture docs, observation depth, citation
0b2675d
# Architecture β€” 911 Dispatch Supervisor
## Layer Overview
OpenEnvEnvironment ← public API (reset/step/state/legal_actions)
β”‚
DispatchStateMachine ← simulation engine
β”œβ”€β”€ DispatchProtocolValidator ← action legality (15+ rules)
β”œβ”€β”€ RewardCalculator ← 5-component weighted reward
└── DispatchScenarioFactory ← deterministic task fixtures
β”‚
Task-Specific Graders ← episode-level scoring
## Key Design Decisions
### Why Manhattan Distance Physics
Real city blocks use Manhattan (rectilinear) distance for navigation.
Euclidean distance would underestimate travel time by ~27% on average,
making ETAs unrealistically optimistic. Manhattan physics produce ETAs
that match real CAD system calculations.
### Why Legal Actions Are Pre-filtered
Rather than letting agents propose arbitrary actions and penalizing illegal
ones, the environment exposes only currently-valid actions via `legal_actions()`.
This eliminates wasted LLM budget on invalid action generation and focuses
evaluation on dispatch decision quality, not action syntax compliance.
### Why the Safety Gate Uses 0.2 Not 0.0
A hard zero for any P1 failure would make the reward surface completely flat
for bad agents β€” no gradient to learn from. Capping at 0.2 preserves partial
signal (coverage, response time on other incidents) while making P1 failure
unambiguously catastrophic. Real dispatch accountability works the same way:
an incident review happens, but other good work is still acknowledged.
### Why Phraseology Is Scored
Real dispatchers are evaluated on radio communication clarity. An agent that
dispatches the right unit but says nothing (or says the wrong thing) is less
useful as a CAD copilot than one that also generates the correct radio traffic.
Phraseology scoring creates incentive for agents to learn domain language, not
just resource allocation.
### Why Waves Spawn at Fixed Steps Not Random Times
Reproducibility is a first-class requirement. Fixed step offsets guarantee
identical episode structure across all runs, making score comparisons valid.
The challenge comes from the agent not knowing wave timing in advance β€” it
must react, not plan.
## State Machine Transitions
Unit: AVAILABLE β†’ DISPATCHED β†’ ON_SCENE β†’ AVAILABLE
β†˜οΈ STAGED ↗️
Incident: PENDING β†’ RESPONDING β†’ ON_SCENE β†’ RESOLVED
β†˜οΈ ESCALATED (survival clock expired)
## File Map
| File | Responsibility |
|---|---|
| `src/models.py` | All Pydantic data contracts |
| `src/state_machine.py` | Core simulation engine |
| `src/protocol.py` | Action legality validation |
| `src/rewards.py` | Reward calculation |
| `src/physics.py` | Manhattan distance, ETA, coverage |
| `src/phraseology.py` | Radio language scoring |
| `src/city_schema.py` | City topology loader |
| `src/tasks/registry.py` | Task definitions and fixtures |
| `src/openenv_environment.py` | OpenEnv API wrapper |
| `server/app.py` | FastAPI server |