Ajayyy00 commited on
Commit
d1a6504
·
1 Parent(s): e170034

Update README with in-depth environment details

Browse files
Files changed (1) hide show
  1. README.md +33 -29
README.md CHANGED
@@ -1,43 +1,47 @@
1
- ---
2
- title: CyberSOC Environment
3
- emoji: 🛡️
4
- colorFrom: blue
5
- colorTo: indigo
6
- sdk: docker
7
- app_port: 7860
8
- ---
9
 
10
- # CyberSOC: Agentic Incident Response Environment 🛡️
11
 
12
- CyberSOC is an OpenEnv reinforcement learning environment designed to evaluate AI agents on their ability to perform Incident Response across an enterprise network.
13
 
14
- ## 🌟 Hackathon Benchmark
15
 
16
- This environment evaluates an agent against three core scenarios: **Easy**, **Medium**, and **Hard**.
 
 
 
 
17
 
18
- Agents must utilize structured actions to navigate a network and identify threats:
19
- - `query_host`: Map the active topology.
20
- - `run_forensics`: Scrape memory and process lists.
21
- - `kill_process` & `block_ioc`: Perform active containment.
22
- - `isolate_segment`: Implement extreme fail-safes.
23
- - `submit_containment_plan`: Formulate a final executive overview.
24
 
25
- ### Dense Grading
26
- Unlike simple pass/fail benchmarks, CyberSOC evaluates agents on both security effectiveness and business continuity. Agents are rewarded for correctly identifying and terminating malicious processes, but heavily penalized for isolating sterile network segments ("Business Downtime").
 
 
 
 
 
27
 
28
- ## 🚀 Evaluating the Agent
 
 
 
 
29
 
30
- This repository is fully packaged as a Docker container.
31
 
32
- ### Running the Baseline
33
- The `inference.py` script provides the complete out-of-the-box benchmarking loop requested by the hackathon. You can run it directly:
 
 
34
 
35
  ```bash
36
  python inference.py
37
  ```
38
 
39
- ### Agent Configuration
40
- To execute the agent successfully, provide your desired LLM configurations as standard environment variables:
41
- - `API_KEY` - Your LLM token
42
- - `API_BASE_URL` - The LLM endpoint (e.g. Hugging Face Router or OpenAI)
43
- - `MODEL_NAME` - The model identifier
 
1
+ # CyberSOC: Agentic Incident Response Environment
 
 
 
 
 
 
 
2
 
3
+ CyberSOC is a production-grade, highly scalable reinforcement learning environment built on the OpenEnv framework. It is designed to rigorously evaluate, benchmark, and train autonomous AI agents on their ability to perform enterprise Incident Response (IR).
4
 
5
+ ## Architecture & Environment Dynamics
6
 
7
+ This environment deviates from static puzzles by introducing a fully dynamic network topology graph modeling up to 500 enterprise nodes. It encapsulates an entire Security Operations Center pipeline, preventing agents from overfitting through programmatic randomization.
8
 
9
+ ### Network Topology Generation
10
+ The CyberSOC environment utilizes a seed-based procedural generation engine. Whenever an agent requests a new episode, the environment dynamically structures a distinct network infrastructure, randomizing:
11
+ - Subnet mapping and host allocations.
12
+ - Intrusion vectors and persistent active threats (spanning 12 categories including Supply Chain, APT campaigns, and Ransomware).
13
+ - Forensic footprints across the network, including anomalous memory artifacts, generated rogue process names, and malicious IPv4/Domain connections.
14
 
15
+ ### State and Action Space
16
+ CyberSOC requires the agent to interact using explicit, structured configurations (via Pydantic models). This bridges the semantic reasoning of Large Language Models with deterministic Python backend execution.
 
 
 
 
17
 
18
+ Available agent actions:
19
+ 1. `query_host`: Map the active architecture and retrieve high-level endpoint configuration.
20
+ 2. `run_forensics`: Extract deep system artifacts (processes, memory dumps, network logs) for specific nodes.
21
+ 3. `kill_process`: Actively terminate detected malicious execution.
22
+ 4. `block_ioc`: Blacklist Indicators of Compromise (Domains, IPs, Hashes) network-wide.
23
+ 5. `isolate_segment`: Implement extreme fail-safes by quarantining entire subnets.
24
+ 6. `submit_containment_plan`: Formulate the final executive report and successfully terminate the episode.
25
 
26
+ ### Dense Grading and Business Constraints
27
+ The algorithmic grading system implements intelligent reward shaping. Standard evaluations focus merely on threat containment; however, CyberSOC enforces harsh business continuity constraints:
28
+ - **Positive Reinforcement**: Agents receive immediate micro-rewards (e.g., +0.25) for locating and correctly terminating malicious processes mid-investigation.
29
+ - **Negative Penalties**: Agents are strictly penalized for increasing "Business Downtime." Rash decisions, such as quarantining clean subnets or wasting compute cycles on benign hosts, incur negative rewards (-0.02 to -0.20 per action).
30
+ - **Final Grading Engine**: A robust backend grader compares the submitted containment plan against the ground truth of the active seeded threat, outputting a highly accurate deterministic score from 0.0 to 1.0.
31
 
32
+ ## Evaluation Benchmark
33
 
34
+ The included baseline evaluates autonomous agents against three curated incident constraints: Easy, Medium, and Hard.
35
+
36
+ ### Execution
37
+ The `inference.py` script executes an out-of-the-box benchmarking loop natively compatible with standard OpenAI client protocols. It enforces strict, parseable stdout logging formats (`[START]`, `[STEP]`, `[END]`) required for programmatic grading.
38
 
39
  ```bash
40
  python inference.py
41
  ```
42
 
43
+ ### Configuration
44
+ To execute the inference baseline properly, export the following environment variables:
45
+ - `API_KEY`: Your LLM access token.
46
+ - `API_BASE_URL`: The target LLM API endpoint.
47
+ - `MODEL_NAME`: The model identifier used for inference.