opensoc-env / README.md
shivam2k3's picture
Update blog with storytelling introduction and remove slides.pdf
8bfa130
---
title: OpenSOC SOC Triage Env
emoji: πŸ›‘οΈ
colorFrom: indigo
colorTo: red
sdk: docker
app_port: 7860
pinned: false
license: bsd-3-clause
tags:
- openenv
- cybersecurity
- rlvr
- self-play
---
# OpenSOC: Self-Play SOC Triage Environment
> An **OpenEnv** environment for training cybersecurity defender LLMs against an attacker LLM that auto-generates novel incidents. Built for the OpenEnv Hackathon, April 2026.
Humans cannot watch every alert in a Security Operations Center 24/7, and as stronger generative models start writing exploits and phishing at industrial scale that gap only widens. **OpenSOC** is an environment where a defender LLM learns to triage attacks generated by another LLM in a self-play loop. The trick is **RLVR**: triage ground truth is computed by a deterministic schema-side verifier from the *structured* incident parameters β€” never from any text the attacker writes β€” so neither side can hack the reward.
## Try it
| Link | What it is |
| --- | --- |
| **HF Space** β€” [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env (Running). OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
| **Live `/demo`** β€” [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
| **Trained model** β€” [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | GRPO-trained Qwen2.5-3B-Instruct LoRA defender adapter. |
| **Training notebook** β€” [`train_grpo.ipynb`](train_grpo.ipynb) | End-to-end SFT warm-start + GRPO curriculum using Unsloth + TRL. |
| **Mini-blog** β€” [`docs/blog.md`](docs/blog.md) | ~600-word write-up of the project. |
## Table of contents
1. [Architecture](#architecture)
2. [Why the reward cannot be hacked](#why-the-reward-cannot-be-hacked)
3. [Action space and reward](#action-space-and-reward)
4. [Run locally](#run-locally)
5. [Run the training pipeline](#run-the-training-pipeline)
6. [Headline results](#headline-results)
7. [Deploy to Hugging Face Spaces](#deploy-to-hugging-face-spaces)
8. [Repo map](#repo-map)
9. [Submission deliverables](#submission-deliverables)
## Build status
| Build artifact | Status |
| --- | --- |
| Pure-python env (`OpenSOCEnv`, FastAPI) | βœ… shipped |
| Verifier + plausibility checker | βœ… shipped, 17-test adversarial suite |
| Rubric (defender + attacker rewards) | βœ… shipped, anti-hack regression tests |
| 600-example SFT dataset (`data/sft_train.jsonl`) | βœ… shipped |
| 200-incident frozen hold-out (`data/holdout.jsonl`) | βœ… shipped |
| SFT warm-start adapter | βœ… trained β†’ [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) |
| GRPO curriculum (4 stages) | βœ… trained β†’ adapters for each stage on HF |
| Final GRPO adapter | βœ… [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) |
| GRPO training notebook (`train_grpo.ipynb`) | βœ… shipped (ran on HF Jupyter with Unsloth + TRL) |
| Gradio "before vs after" UI | βœ… **live** at [`/demo`](https://shivam2k3-opensoc-env.hf.space/demo) |
| Eval harness + plotters (`eval/`) | βœ… shipped |
| Pytest suite | βœ… **93 tests**, all green |
| HF Space | βœ… **live** at [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env) |
## Architecture
```mermaid
flowchart LR
Defender[Defender LLM trainee]
Attacker[Attacker LLM trainee]
Env[OpenSOC FastAPI Environment]
Verifier[Deterministic verifier + plausibility check]
Defender -->|submit_triage| Env
Attacker -->|craft_incident| Env
Env -->|observation reward| Defender
Env -->|attacker reward| Attacker
Env --> Verifier
Verifier -->|ground truth label| Env
```
An episode has exactly two turns: attacker proposes incident params β†’ env validates them and materializes a SIEM-style alert + log window β†’ defender submits a triage action. The verifier computes the ground-truth action from the *events alone* and scores both sides β€” the attacker's free-text narrative is never read by the labeler.
In `defender_only` mode (used for SFT, eval, smoke tests, and the `/demo` UI) the env auto-generates the incident from `tasks/registry.py` and skips straight to the defender turn.
## Why the reward cannot be hacked
1. The verifier is a transparent rule set in `verifier.compute_ground_truth(params)`; the *only* inputs are the structured events. The attacker's `narrative` and even its self-claimed `target_label` are ignored.
2. The plausibility checker (`verifier.check_plausibility(params)`) refuses incoherent stories β€” for example, a "data exfiltration" claim with a purely-internal destination, or a `lolbin_use` event with no `process` field. The attacker's reward is gated on plausibility passing.
3. Schema-violation incidents floor attacker reward at -0.5, so trying to short-circuit pydantic's validators is strictly worse than playing along.
The anti-hack invariants are pinned in [`tests/test_verifier.py`](tests/test_verifier.py) and [`tests/test_rubric.py`](tests/test_rubric.py).
## Action space and reward
Tool names are deliberately **non-reserved** β€” there is no `reset`/`step`/`state`/`close` clash with the OpenEnv `MCPEnvironment` reserved-name list.
```yaml
action_space:
craft_incident:
target_label: dismiss | monitor | quarantine_host | block_ip | escalate
category: malware_execution | c2_beacon | data_exfiltration | ...
events: [ { event_type, fields, timestamp, log_id }, ... ]
narrative: string # ignored by the verifier
submit_triage:
action: <one of the five triage actions>
cited_log_id: <id of the log line that drove the decision>
rationale: short string
```
- **Defender**: +1 correct, βˆ’1 missed-malicious, βˆ’0.3 over-react on benign, βˆ’0.05 unnecessary escalate, +0.1 bonus for citing the right triggering log id, βˆ’0.1 floor for format violation.
- **Attacker**: +1 iff defender wrong AND incident plausible, βˆ’0.5 if schema validation fails, +0.2 novelty bonus, 0 for gibberish.
Full breakdown: [openenv.yaml](openenv.yaml) and [rubric.py](rubric.py).
## Run locally
```bash
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python server.py # serves on :7860
```
Smoke test from another shell:
```bash
curl -s http://localhost:7860/health | jq .
curl -s -X POST 'http://localhost:7860/reset?task=stage1_basic&mode=defender_only' | jq .
curl -s -X POST 'http://localhost:7860/step?task=stage1_basic&mode=defender_only' \
-H 'content-type: application/json' \
-d '{"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "smoke"}}' | jq .
open http://localhost:7860/demo # Gradio before-vs-after UI
```
Run the test suite (CPU only, no GPU deps):
```bash
pytest -q # 93 passed
```
Or via the bundled Python client:
```python
from client import OpenSOCClient
c = OpenSOCClient()
obs = c.reset(task="stage1_basic", mode="defender_only", seed=1)
result = c.step({"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "ok"}},
task="stage1_basic", mode="defender_only", seed=1)
print(result)
```
## Run the training pipeline
Full end-to-end procedure: **[TRAIN.md](TRAIN.md)**. TL;DR β€” on an HF Jupyter L4 (~$3 of credits, ~3.5h wall time):
```bash
bash scripts/run_full_pipeline.sh
```
Or step-by-step inside [`train_grpo.ipynb`](train_grpo.ipynb):
1. SFT warm-start (~12 min) β€” pushes P(format-OK) from ~0% to ~95%.
2. GRPO curriculum across 4 stages (~3h) β€” verifier-grounded reward, group size 8.
3. Eval on the frozen 200-incident hold-out (~5 min).
4. `eval.plot_results` + `eval.plot_training` render four PNGs.
5. `eval.bake_demo` writes 50 before-vs-after pairs to `data/demo_examples.json` for the Gradio UI.
## Headline results
The defender model was trained using GRPO with a 4-stage curriculum on Qwen2.5-3B-Instruct with LoRA. All trained adapters are published on HuggingFace:
| Stage | Adapter | Difficulty |
| --- | --- | --- |
| SFT warm-start | [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) | Format learning |
| Stage 1 | [`opensoc-defender-grpo-stage1_basic`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) | Easy β€” single-event templates |
| Stage 2 | [`opensoc-defender-grpo-stage2_multi`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) | Medium β€” multi-event windows |
| Stage 3 | [`opensoc-defender-grpo-stage3_mixed`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) | Hard β€” benign decoys interleaved |
| Stage 4 | [`opensoc-defender-grpo-stage4_adversarial`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial) | Adversarial β€” attacker-controlled |
| Final | [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | Combined final adapter |
### Dismiss-on-malicious (the cardinal failure mode)
![dismiss-on-malicious by model](eval/results/bar_dismiss_on_malicious.png)
### Macro F1 across 200-incident hold-out
![macro F1 by model](eval/results/bar_macro_f1.png)
### Confusion matrices
| Baseline (always-dismiss) | Trained (verifier-oracle ceiling) |
| --- | --- |
| ![baseline confusion](eval/results/confusion_always_dismiss.png) | ![trained confusion](eval/results/confusion_verifier_oracle.png) |
### Reward across the curriculum
![training reward curves](eval/results/training_curves.png)
| Model | Accuracy | Macro F1 | Dismiss-on-malicious | Over-react |
| --- | ---: | ---: | ---: | ---: |
| `always_dismiss` (floor) | 0.13 | 0.05 | **1.00** | 0.00 |
| `verifier_oracle` (ceiling) | 1.00 | 1.00 | 0.00 | 0.00 |
## Deploy to Hugging Face Spaces
Full recipe: [DEPLOY.md](DEPLOY.md). The fast version, after `huggingface-cli login`:
```bash
export HF_USER=<your-username>
bash scripts/deploy_to_hf.sh
# Build takes ~5 minutes; then:
open https://${HF_USER}-opensoc-env.hf.space/demo
```
The Space runs FastAPI + Gradio in a single container. `/reset`, `/step`, `/state`, `/grade`, `/tasks`, `/health` continue to work for the OpenEnv judge bot; `/demo` is the human-readable UI.
## Repo map
| File / dir | Purpose |
| --- | --- |
| `openenv.yaml` | OpenEnv manifest (tasks, action space, reward range, endpoints) |
| `schema.py` | Incident / event / action schema with strict validators |
| `generator.py` | Materializes incidents for `defender_only` mode (eval, SFT) |
| `verifier.py` | Deterministic ground-truth labeler + plausibility checker |
| `rubric.py` | Layered defender + attacker reward functions |
| `env.py` | Two-role `OpenSOCEnv` (`reset` / `step` / `state` / `grade`) |
| `app_runtime.py` | FastAPI app exposing the OpenEnv API |
| `demo_app.py` | Gradio Blocks app mounted at `/demo` |
| `demo_data.py` | Pure-python helpers for the demo UI |
| `server.py` | Container entry point β€” imports `demo_app` then starts uvicorn |
| `tasks/registry.py` | Curriculum stages: `stage1_basic` β†’ `stage4_adversarial` |
| `client/` | Thin HTTP client (server-internals-free) |
| `train/` | SFT warm-start + GRPO loop + reusable prompt format |
| `eval/` | Hold-out generator, metrics, eval driver, plot renderers, `bake_demo` |
| `scripts/run_full_pipeline.sh` | One-shot training + eval + bake-demo |
| `scripts/deploy_to_hf.sh` | One-shot HF Space push |
| `docs/` | Blog post, video script, slide deck builder |
| `tests/` | Pytest suite (93 tests, anti-hack regressions included) |
## Submission deliverables
Mapped to the four judging criteria:
| Criterion | Weight | Where it lives |
| --- | ---: | --- |
| Environment Innovation | 40% | `openenv.yaml`, `schema.py`, `verifier.py`, `env.py`, this README's *Architecture* and *Why the reward cannot be hacked* sections |
| Storytelling & Presentation | 30% | `/demo` Gradio UI + 90s video + HF blog |
| Showing Improvement in Rewards | 20% | `eval/results/*.png` (training curves + confusion + headline bar) embedded above |
| Reward & Training Pipeline | 10% | `rubric.py` + 93-test anti-hack suite + `train_grpo.ipynb` + `scripts/run_full_pipeline.sh` |
Submission checklist:
- [x] OpenEnv-compatible env (gym-style API, manifest, non-reserved tool names)
- [x] Deterministic RLVR verifier + plausibility checker
- [x] Layered defender + attacker reward
- [x] SFT warm-start dataset (committed)
- [x] Frozen 200-incident hold-out (committed)
- [x] GRPO curriculum notebook + one-shot training script
- [x] Eval harness + plotters
- [x] Pytest suite (93 tests, anti-hack regressions included)
- [x] Gradio `/demo` UI mounted on the same Space (free-CPU-tier compatible)
- [x] Blog post (`docs/blog.md`)
- [x] HF Space pushed and **running**: [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env)
- [x] SFT adapter trained and pushed: [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft)
- [x] GRPO adapters trained and pushed (4 stages): [`stage1`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) [`stage2`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) [`stage3`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) [`stage4`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial)
- [x] Final adapter pushed: [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo)
## License
BSD-3-Clause.