Sentinel / tasks /todo.md
nihalaninihal's picture
Add randomized attacker, security metrics engine, and updated Gradio dashboard
69a7e43

SentinelOps Arena — Winning Hackathon Implementation Plan

Gap Analysis (from codebase audit)

Gap Description Priority
Scripted attacker HeuristicAttacker fires at fixed ticks (7/14/20/25) — not adaptive CRITICAL
No key metrics ASR, Benign Task Success, FPR, MTTD not computed CRITICAL
No metrics in Gradio Dashboard shows scores but not security-specific metrics HIGH
About tab outdated Doesn't reflect the full narrative MEDIUM

Implementation Tasks

Task 1: Randomized Adaptive Attacker

  • Replace HeuristicAttacker.ATTACK_SCHEDULE with budget-based random strategy
  • Random attack type selection weighted by past success
  • Random timing (not fixed ticks)
  • Random target system selection
  • Varying social engineering messages (not just one template)
  • Keep budget constraint (10.0, cost 0.3 per attack)

Task 2: Key Metrics Engine

  • Create sentinelops_arena/metrics.py
  • Compute from episode log:
    • Attack Success Rate (ASR) = attacks that caused worker failure / total attacks
    • Benign Task Success = successful tasks / total tasks attempted
    • False Positive Rate (FPR) = false flags / total oversight flags
    • Mean Time to Detect (MTTD) = avg ticks between attack and first detection

Task 3: Metrics in Gradio Dashboard

  • Add metrics panel to Run Episode tab
  • Add metrics to Before/After comparison tab
  • Styled HTML cards matching the cybersecurity theme

Task 4: Update About Tab

  • Full narrative matching the vision document
  • Key metrics definitions
  • Self-play explanation

Verification

  • python -c "from sentinelops_arena.demo import run_episode; run_episode()" works
  • python -c "from sentinelops_arena.metrics import compute_episode_metrics; print('OK')" works
  • Gradio app launches without errors
  • Randomized attacker produces different attack patterns across seeds
    • Seed 42: 10 attacks at ticks [1,2,4,11,13,17,18,19,21,27]
    • Seed 99: 10 attacks at ticks [1,2,5,12,20,23,25,27,28,29]
    • Seed 7: 12 attacks at ticks [1,2,3,4,5,7,9,12,14,20,25,28]
  • Metrics compute correctly (ASR, Benign Success, FPR, MTTD)
  • Trained worker outperforms untrained (30.0 vs 25.0 worker score)