Spaces:
Sleeping
Exploit Analysis — 911 Dispatch Supervisor
Known Attack Vectors Considered & Closed
1. Reward Farming via Repeated Dispatch-Cancel Cycles
Vector: Agent dispatches a unit, immediately cancels, re-dispatches to collect partial response_time reward on each cycle without ever resolving incidents. Mitigation: Cancel actions return the unit to AVAILABLE but do NOT reset the incident survival clock. The incident continues counting down regardless of agent action, so farming cancel-dispatch cycles accelerates incident escalation and triggers the Safety Gate, collapsing the score to ≤0.2.
2. Safety Gate Bypass via P2-Only Dispatching
Vector: Agent ignores all P1 incidents and only resolves P2/P3 incidents to accumulate triage and response_time rewards without triggering the Safety Gate. Mitigation: The Safety Gate activates if ANY P1 incident existed during the episode and its survival score is 0.0. The agent cannot avoid P1 incidents existing — they are spawned deterministically by the scenario fixture.
3. Coverage Score Farming via Staging
Vector: Agent repeatedly stages all units in one district to maximize coverage score for that district while ignoring active incidents. Mitigation: Coverage score is computed across ALL districts simultaneously. Concentrating units in one district reduces coverage elsewhere, and staged units cannot respond to incidents without an explicit dispatch action, allowing incident survival clocks to expire and triggering escalation penalties.
4. Phraseology Score Inflation via Notes Stuffing
Vector: Agent fills the notes field with every possible dispatch phrase to maximize token overlap with canonical phrases. Mitigation: PhraseologyJudge uses token overlap normalized by notes length. Stuffing long notes with irrelevant text reduces precision, keeping the score low. Only notes that match the specific action type and incident type score highly.
5. Determinism Exploitation
Vector: Agent memorizes the exact incident sequence (seed=42) and hardcodes optimal actions rather than learning dispatch reasoning. Mitigation: This is intentional for reproducibility. However, the wave spawn system introduces timing-dependent incident locations with small perturbations, meaning hardcoded action sequences fail when unit positions vary. The environment is designed for evaluation, not training-time generalization.
Conclusion
No reward farming exploit was found that allows an agent to score >0.6 without genuinely resolving Priority-1 incidents with correct unit types within survival clock windows.