spec_version: 1 name: network_forensics type: space runtime: fastapi app: server.app:app port: 8000 description: > An OpenEnv benchmark for autonomous network threat investigation. Agents inspect PCAP traffic, flag malicious packets, group attack sessions, classify attack patterns, identify the initial compromise, and submit an incident report evaluated by both deterministic grading and LLM-as-a-Judge scoring. tags: - openenv - rl-environment - network-security - cybersecurity - forensics - llm-judge - pytorch - meta tasks: - id: easy description: > DDoS-heavy traffic mixed with benign flows. Goal: recover the dominant malicious campaign. difficulty: easy max_steps: 40 - id: medium description: > Mixed web attacks: brute force, XSS, and SQL injection. Goal: separate concurrent attack campaigns and tag them correctly. difficulty: medium max_steps: 70 - id: hard description: > High-noise DoS traffic with Hulk, GoldenEye, Slowloris, SlowHTTPTest, and a rare Heartbleed trace. Goal: recover multiple sessions, avoid false positives, and identify the root cause accurately. difficulty: hard max_steps: 100 evaluation: method: hybrid components: - type: programmatic weight: 0.85 formula: "0.25 * precision + 0.35 * recall + 0.25 * logic_score" - type: llm_judge weight: 0.15 description: > Scores the agent's free-text incident summary on accuracy, completeness, clarity, and analytical insight. fallback: keyword_heuristic action_space: - inspect_packet - flag_as_suspicious - group_into_session - tag_pattern - identify_entry_point - submit_report observation_space: includes: - visible_packets - flagged_packet_ids - grouped_sessions - tagged_patterns - claimed_entry_point - connection_graph_summary - current_score_estimate mcp: enabled: true endpoint: /mcp description: > MCP (Model Context Protocol) endpoint for production inference. Any MCP-compatible agent can connect via HTTP POST or WebSocket to investigate network traffic using the tools below. tools: - name: reset_env description: Start a new investigation episode with a chosen difficulty - name: get_status description: Get current investigation progress, score, and session summary - name: inspect_packet description: Reveal the full payload of a packet for deep analysis - name: flag_as_suspicious description: Flag a packet as malicious traffic - name: group_into_session description: Group related packets into a named attack session - name: tag_pattern description: Tag a session with an attack family classification - name: identify_entry_point description: Identify the initial compromise packet - name: submit_report description: Submit final incident report for LLM-as-Judge scoring