Spaces:

Ajay00747
/

CyberSOC

Sleeping

App Files Files Community

Ajayyy00 commited on Apr 3

Commit

d1a6504

1 Parent(s): e170034

Update README with in-depth environment details

Browse files

Files changed (1) hide show

README.md +33 -29

README.md CHANGED Viewed

@@ -1,43 +1,47 @@
----
-title: CyberSOC Environment
-emoji: 🛡️
-colorFrom: blue
-colorTo: indigo
-sdk: docker
-app_port: 7860
----
-# CyberSOC: Agentic Incident Response Environment 🛡️
-CyberSOC is an OpenEnv reinforcement learning environment designed to evaluate AI agents on their ability to perform Incident Response across an enterprise network.
-## 🌟 Hackathon Benchmark
-This environment evaluates an agent against three core scenarios: **Easy**, **Medium**, and **Hard**.
-Agents must utilize structured actions to navigate a network and identify threats:
-- `query_host`: Map the active topology.
-- `run_forensics`: Scrape memory and process lists.
-- `kill_process` & `block_ioc`: Perform active containment.
-- `isolate_segment`: Implement extreme fail-safes.
-- `submit_containment_plan`: Formulate a final executive overview.
-### Dense Grading
-Unlike simple pass/fail benchmarks, CyberSOC evaluates agents on both security effectiveness and business continuity. Agents are rewarded for correctly identifying and terminating malicious processes, but heavily penalized for isolating sterile network segments ("Business Downtime").
-## 🚀 Evaluating the Agent
-This repository is fully packaged as a Docker container.
-### Running the Baseline
-The `inference.py` script provides the complete out-of-the-box benchmarking loop requested by the hackathon. You can run it directly:
 ```bash
 python inference.py
 ```
-### Agent Configuration
-To execute the agent successfully, provide your desired LLM configurations as standard environment variables:
-- `API_KEY` - Your LLM token
-- `API_BASE_URL` - The LLM endpoint (e.g. Hugging Face Router or OpenAI)
-- `MODEL_NAME` - The model identifier

+# CyberSOC: Agentic Incident Response Environment
+CyberSOC is a production-grade, highly scalable reinforcement learning environment built on the OpenEnv framework. It is designed to rigorously evaluate, benchmark, and train autonomous AI agents on their ability to perform enterprise Incident Response (IR).
+## Architecture & Environment Dynamics
+This environment deviates from static puzzles by introducing a fully dynamic network topology graph modeling up to 500 enterprise nodes. It encapsulates an entire Security Operations Center pipeline, preventing agents from overfitting through programmatic randomization.
+### Network Topology Generation
+The CyberSOC environment utilizes a seed-based procedural generation engine. Whenever an agent requests a new episode, the environment dynamically structures a distinct network infrastructure, randomizing:
+- Subnet mapping and host allocations.
+- Intrusion vectors and persistent active threats (spanning 12 categories including Supply Chain, APT campaigns, and Ransomware).
+- Forensic footprints across the network, including anomalous memory artifacts, generated rogue process names, and malicious IPv4/Domain connections.
+### State and Action Space
+CyberSOC requires the agent to interact using explicit, structured configurations (via Pydantic models). This bridges the semantic reasoning of Large Language Models with deterministic Python backend execution.
+Available agent actions:
+1. `query_host`: Map the active architecture and retrieve high-level endpoint configuration.
+2. `run_forensics`: Extract deep system artifacts (processes, memory dumps, network logs) for specific nodes.
+3. `kill_process`: Actively terminate detected malicious execution.
+4. `block_ioc`: Blacklist Indicators of Compromise (Domains, IPs, Hashes) network-wide.
+5. `isolate_segment`: Implement extreme fail-safes by quarantining entire subnets.
+6. `submit_containment_plan`: Formulate the final executive report and successfully terminate the episode.
+### Dense Grading and Business Constraints
+The algorithmic grading system implements intelligent reward shaping. Standard evaluations focus merely on threat containment; however, CyberSOC enforces harsh business continuity constraints:
+- **Positive Reinforcement**: Agents receive immediate micro-rewards (e.g., +0.25) for locating and correctly terminating malicious processes mid-investigation.
+- **Negative Penalties**: Agents are strictly penalized for increasing "Business Downtime." Rash decisions, such as quarantining clean subnets or wasting compute cycles on benign hosts, incur negative rewards (-0.02 to -0.20 per action).
+- **Final Grading Engine**: A robust backend grader compares the submitted containment plan against the ground truth of the active seeded threat, outputting a highly accurate deterministic score from 0.0 to 1.0.
+## Evaluation Benchmark
+The included baseline evaluates autonomous agents against three curated incident constraints: Easy, Medium, and Hard.
+### Execution
+The `inference.py` script executes an out-of-the-box benchmarking loop natively compatible with standard OpenAI client protocols. It enforces strict, parseable stdout logging formats (`[START]`, `[STEP]`, `[END]`) required for programmatic grading.
 ```bash
 python inference.py
 ```
+### Configuration
+To execute the inference baseline properly, export the following environment variables:
+- `API_KEY`: Your LLM access token.
+- `API_BASE_URL`: The target LLM API endpoint.
+- `MODEL_NAME`: The model identifier used for inference.