Spaces:

garvitsachdeva
/

911

Sleeping

App Files Files Community

911 / docs /exploit_analysis.md

Abhinav31122006

feat: exploit analysis, architecture docs, observation depth, citation

0b2675d about 1 month ago

preview code

raw

history blame contribute delete

2.63 kB

	# Exploit Analysis — 911 Dispatch Supervisor

	## Known Attack Vectors Considered & Closed

	### 1. Reward Farming via Repeated Dispatch-Cancel Cycles
	Vector: Agent dispatches a unit, immediately cancels, re-dispatches to collect
	partial response_time reward on each cycle without ever resolving incidents.
	Mitigation: Cancel actions return the unit to AVAILABLE but do NOT reset the
	incident survival clock. The incident continues counting down regardless of agent
	action, so farming cancel-dispatch cycles accelerates incident escalation and
	triggers the Safety Gate, collapsing the score to ≤0.2.

	### 2. Safety Gate Bypass via P2-Only Dispatching
	Vector: Agent ignores all P1 incidents and only resolves P2/P3 incidents to
	accumulate triage and response_time rewards without triggering the Safety Gate.
	Mitigation: The Safety Gate activates if ANY P1 incident existed during the
	episode and its survival score is 0.0. The agent cannot avoid P1 incidents
	existing — they are spawned deterministically by the scenario fixture.

	### 3. Coverage Score Farming via Staging
	Vector: Agent repeatedly stages all units in one district to maximize coverage
	score for that district while ignoring active incidents.
	Mitigation: Coverage score is computed across ALL districts simultaneously.
	Concentrating units in one district reduces coverage elsewhere, and staged units
	cannot respond to incidents without an explicit dispatch action, allowing incident
	survival clocks to expire and triggering escalation penalties.

	### 4. Phraseology Score Inflation via Notes Stuffing
	Vector: Agent fills the notes field with every possible dispatch phrase to
	maximize token overlap with canonical phrases.
	Mitigation: PhraseologyJudge uses token overlap normalized by notes length.
	Stuffing long notes with irrelevant text reduces precision, keeping the score low.
	Only notes that match the specific action type and incident type score highly.

	### 5. Determinism Exploitation
	Vector: Agent memorizes the exact incident sequence (seed=42) and hardcodes
	optimal actions rather than learning dispatch reasoning.
	Mitigation: This is intentional for reproducibility. However, the wave spawn
	system introduces timing-dependent incident locations with small perturbations,
	meaning hardcoded action sequences fail when unit positions vary. The environment
	is designed for evaluation, not training-time generalization.

	## Conclusion
	No reward farming exploit was found that allows an agent to score >0.6 without
	genuinely resolving Priority-1 incidents with correct unit types within survival
	clock windows.