Spaces:

Ajay00747
/

CyberSOC

Sleeping

App Files Files Community

CyberSOC / README.md

Ajayyy00

Add required HF metadata header

9db7ca3 about 2 months ago

preview code

raw

history blame contribute delete

3.64 kB

	---
	title: CyberSOC Environment
	emoji: 🛡️
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 7860
	---

	# CyberSOC: Agentic Incident Response Environment

	CyberSOC is a production-grade, highly scalable reinforcement learning environment built on the OpenEnv framework. It is designed to rigorously evaluate, benchmark, and train autonomous AI agents on their ability to perform enterprise Incident Response (IR).

	## Architecture & Environment Dynamics

	This environment deviates from static puzzles by introducing a fully dynamic network topology graph modeling up to 500 enterprise nodes. It encapsulates an entire Security Operations Center pipeline, preventing agents from overfitting through programmatic randomization.

	### Network Topology Generation
	The CyberSOC environment utilizes a seed-based procedural generation engine. Whenever an agent requests a new episode, the environment dynamically structures a distinct network infrastructure, randomizing:
	- Subnet mapping and host allocations.
	- Intrusion vectors and persistent active threats (spanning 12 categories including Supply Chain, APT campaigns, and Ransomware).
	- Forensic footprints across the network, including anomalous memory artifacts, generated rogue process names, and malicious IPv4/Domain connections.

	### State and Action Space
	CyberSOC requires the agent to interact using explicit, structured configurations (via Pydantic models). This bridges the semantic reasoning of Large Language Models with deterministic Python backend execution.

	Available agent actions:
	1. `query_host`: Map the active architecture and retrieve high-level endpoint configuration.
	2. `run_forensics`: Extract deep system artifacts (processes, memory dumps, network logs) for specific nodes.
	3. `kill_process`: Actively terminate detected malicious execution.
	4. `block_ioc`: Blacklist Indicators of Compromise (Domains, IPs, Hashes) network-wide.
	5. `isolate_segment`: Implement extreme fail-safes by quarantining entire subnets.
	6. `submit_containment_plan`: Formulate the final executive report and successfully terminate the episode.

	### Dense Grading and Business Constraints
	The algorithmic grading system implements intelligent reward shaping. Standard evaluations focus merely on threat containment; however, CyberSOC enforces harsh business continuity constraints:
	- Positive Reinforcement: Agents receive immediate micro-rewards (e.g., +0.25) for locating and correctly terminating malicious processes mid-investigation.
	- Negative Penalties: Agents are strictly penalized for increasing "Business Downtime." Rash decisions, such as quarantining clean subnets or wasting compute cycles on benign hosts, incur negative rewards (-0.02 to -0.20 per action).
	- Final Grading Engine: A robust backend grader compares the submitted containment plan against the ground truth of the active seeded threat, outputting a highly accurate deterministic score from 0.0 to 1.0.

	## Evaluation Benchmark

	The included baseline evaluates autonomous agents against three curated incident constraints: Easy, Medium, and Hard.

	### Execution
	The `inference.py` script executes an out-of-the-box benchmarking loop natively compatible with standard OpenAI client protocols. It enforces strict, parseable stdout logging formats (`[START]`, `[STEP]`, `[END]`) required for programmatic grading.

	```bash
	python inference.py
	```

	### Configuration
	To execute the inference baseline properly, export the following environment variables:
	- `API_KEY`: Your LLM access token.
	- `API_BASE_URL`: The target LLM API endpoint.
	- `MODEL_NAME`: The model identifier used for inference.