CyberSOC / README.md
Ajayyy00
Add required HF metadata header
9db7ca3
---
title: CyberSOC Environment
emoji: 🛡️
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
---
# CyberSOC: Agentic Incident Response Environment
CyberSOC is a production-grade, highly scalable reinforcement learning environment built on the OpenEnv framework. It is designed to rigorously evaluate, benchmark, and train autonomous AI agents on their ability to perform enterprise Incident Response (IR).
## Architecture & Environment Dynamics
This environment deviates from static puzzles by introducing a fully dynamic network topology graph modeling up to 500 enterprise nodes. It encapsulates an entire Security Operations Center pipeline, preventing agents from overfitting through programmatic randomization.
### Network Topology Generation
The CyberSOC environment utilizes a seed-based procedural generation engine. Whenever an agent requests a new episode, the environment dynamically structures a distinct network infrastructure, randomizing:
- Subnet mapping and host allocations.
- Intrusion vectors and persistent active threats (spanning 12 categories including Supply Chain, APT campaigns, and Ransomware).
- Forensic footprints across the network, including anomalous memory artifacts, generated rogue process names, and malicious IPv4/Domain connections.
### State and Action Space
CyberSOC requires the agent to interact using explicit, structured configurations (via Pydantic models). This bridges the semantic reasoning of Large Language Models with deterministic Python backend execution.
Available agent actions:
1. `query_host`: Map the active architecture and retrieve high-level endpoint configuration.
2. `run_forensics`: Extract deep system artifacts (processes, memory dumps, network logs) for specific nodes.
3. `kill_process`: Actively terminate detected malicious execution.
4. `block_ioc`: Blacklist Indicators of Compromise (Domains, IPs, Hashes) network-wide.
5. `isolate_segment`: Implement extreme fail-safes by quarantining entire subnets.
6. `submit_containment_plan`: Formulate the final executive report and successfully terminate the episode.
### Dense Grading and Business Constraints
The algorithmic grading system implements intelligent reward shaping. Standard evaluations focus merely on threat containment; however, CyberSOC enforces harsh business continuity constraints:
- **Positive Reinforcement**: Agents receive immediate micro-rewards (e.g., +0.25) for locating and correctly terminating malicious processes mid-investigation.
- **Negative Penalties**: Agents are strictly penalized for increasing "Business Downtime." Rash decisions, such as quarantining clean subnets or wasting compute cycles on benign hosts, incur negative rewards (-0.02 to -0.20 per action).
- **Final Grading Engine**: A robust backend grader compares the submitted containment plan against the ground truth of the active seeded threat, outputting a highly accurate deterministic score from 0.0 to 1.0.
## Evaluation Benchmark
The included baseline evaluates autonomous agents against three curated incident constraints: Easy, Medium, and Hard.
### Execution
The `inference.py` script executes an out-of-the-box benchmarking loop natively compatible with standard OpenAI client protocols. It enforces strict, parseable stdout logging formats (`[START]`, `[STEP]`, `[END]`) required for programmatic grading.
```bash
python inference.py
```
### Configuration
To execute the inference baseline properly, export the following environment variables:
- `API_KEY`: Your LLM access token.
- `API_BASE_URL`: The target LLM API endpoint.
- `MODEL_NAME`: The model identifier used for inference.