Spaces:

garvitsachdeva
/

911

Sleeping

App Files Files Community

911 / docs /architecture.md

Abhinav31122006

feat: exploit analysis, architecture docs, observation depth, citation

0b2675d about 1 month ago

preview code

raw

history blame contribute delete

2.99 kB

Architecture — 911 Dispatch Supervisor

Layer Overview

OpenEnvEnvironment ← public API (reset/step/state/legal_actions) │ DispatchStateMachine ← simulation engine ├── DispatchProtocolValidator ← action legality (15+ rules) ├── RewardCalculator ← 5-component weighted reward └── DispatchScenarioFactory ← deterministic task fixtures │ Task-Specific Graders ← episode-level scoring

Key Design Decisions

Why Manhattan Distance Physics

Real city blocks use Manhattan (rectilinear) distance for navigation. Euclidean distance would underestimate travel time by ~27% on average, making ETAs unrealistically optimistic. Manhattan physics produce ETAs that match real CAD system calculations.

Why Legal Actions Are Pre-filtered

Rather than letting agents propose arbitrary actions and penalizing illegal ones, the environment exposes only currently-valid actions via legal_actions(). This eliminates wasted LLM budget on invalid action generation and focuses evaluation on dispatch decision quality, not action syntax compliance.

Why the Safety Gate Uses 0.2 Not 0.0

A hard zero for any P1 failure would make the reward surface completely flat for bad agents — no gradient to learn from. Capping at 0.2 preserves partial signal (coverage, response time on other incidents) while making P1 failure unambiguously catastrophic. Real dispatch accountability works the same way: an incident review happens, but other good work is still acknowledged.

Why Phraseology Is Scored

Real dispatchers are evaluated on radio communication clarity. An agent that dispatches the right unit but says nothing (or says the wrong thing) is less useful as a CAD copilot than one that also generates the correct radio traffic. Phraseology scoring creates incentive for agents to learn domain language, not just resource allocation.

Why Waves Spawn at Fixed Steps Not Random Times

Reproducibility is a first-class requirement. Fixed step offsets guarantee identical episode structure across all runs, making score comparisons valid. The challenge comes from the agent not knowing wave timing in advance — it must react, not plan.

State Machine Transitions

Unit: AVAILABLE → DISPATCHED → ON_SCENE → AVAILABLE ↘️ STAGED ↗️ Incident: PENDING → RESPONDING → ON_SCENE → RESOLVED ↘️ ESCALATED (survival clock expired)

File Map

File	Responsibility
`src/models.py`	All Pydantic data contracts
`src/state_machine.py`	Core simulation engine
`src/protocol.py`	Action legality validation
`src/rewards.py`	Reward calculation
`src/physics.py`	Manhattan distance, ETA, coverage
`src/phraseology.py`	Radio language scoring
`src/city_schema.py`	City topology loader
`src/tasks/registry.py`	Task definitions and fixtures
`src/openenv_environment.py`	OpenEnv API wrapper
`server/app.py`	FastAPI server