File size: 13,572 Bytes
6ba5cca bb6a031 5cbde7b 64649c4 5cbde7b bb6a031 5cbde7b bb6a031 5cbde7b bb6a031 5cbde7b bb6a031 8bfa130 bb6a031 5cbde7b bb6a031 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 | ---
title: OpenSOC SOC Triage Env
emoji: π‘οΈ
colorFrom: indigo
colorTo: red
sdk: docker
app_port: 7860
pinned: false
license: bsd-3-clause
tags:
- openenv
- cybersecurity
- rlvr
- self-play
---
# OpenSOC: Self-Play SOC Triage Environment
> An **OpenEnv** environment for training cybersecurity defender LLMs against an attacker LLM that auto-generates novel incidents. Built for the OpenEnv Hackathon, April 2026.
Humans cannot watch every alert in a Security Operations Center 24/7, and as stronger generative models start writing exploits and phishing at industrial scale that gap only widens. **OpenSOC** is an environment where a defender LLM learns to triage attacks generated by another LLM in a self-play loop. The trick is **RLVR**: triage ground truth is computed by a deterministic schema-side verifier from the *structured* incident parameters β never from any text the attacker writes β so neither side can hack the reward.
## Try it
| Link | What it is |
| --- | --- |
| **HF Space** β [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env (Running). OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
| **Live `/demo`** β [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
| **Trained model** β [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | GRPO-trained Qwen2.5-3B-Instruct LoRA defender adapter. |
| **Training notebook** β [`train_grpo.ipynb`](train_grpo.ipynb) | End-to-end SFT warm-start + GRPO curriculum using Unsloth + TRL. |
| **Mini-blog** β [`docs/blog.md`](docs/blog.md) | ~600-word write-up of the project. |
## Table of contents
1. [Architecture](#architecture)
2. [Why the reward cannot be hacked](#why-the-reward-cannot-be-hacked)
3. [Action space and reward](#action-space-and-reward)
4. [Run locally](#run-locally)
5. [Run the training pipeline](#run-the-training-pipeline)
6. [Headline results](#headline-results)
7. [Deploy to Hugging Face Spaces](#deploy-to-hugging-face-spaces)
8. [Repo map](#repo-map)
9. [Submission deliverables](#submission-deliverables)
## Build status
| Build artifact | Status |
| --- | --- |
| Pure-python env (`OpenSOCEnv`, FastAPI) | β
shipped |
| Verifier + plausibility checker | β
shipped, 17-test adversarial suite |
| Rubric (defender + attacker rewards) | β
shipped, anti-hack regression tests |
| 600-example SFT dataset (`data/sft_train.jsonl`) | β
shipped |
| 200-incident frozen hold-out (`data/holdout.jsonl`) | β
shipped |
| SFT warm-start adapter | β
trained β [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) |
| GRPO curriculum (4 stages) | β
trained β adapters for each stage on HF |
| Final GRPO adapter | β
[`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) |
| GRPO training notebook (`train_grpo.ipynb`) | β
shipped (ran on HF Jupyter with Unsloth + TRL) |
| Gradio "before vs after" UI | β
**live** at [`/demo`](https://shivam2k3-opensoc-env.hf.space/demo) |
| Eval harness + plotters (`eval/`) | β
shipped |
| Pytest suite | β
**93 tests**, all green |
| HF Space | β
**live** at [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env) |
## Architecture
```mermaid
flowchart LR
Defender[Defender LLM trainee]
Attacker[Attacker LLM trainee]
Env[OpenSOC FastAPI Environment]
Verifier[Deterministic verifier + plausibility check]
Defender -->|submit_triage| Env
Attacker -->|craft_incident| Env
Env -->|observation reward| Defender
Env -->|attacker reward| Attacker
Env --> Verifier
Verifier -->|ground truth label| Env
```
An episode has exactly two turns: attacker proposes incident params β env validates them and materializes a SIEM-style alert + log window β defender submits a triage action. The verifier computes the ground-truth action from the *events alone* and scores both sides β the attacker's free-text narrative is never read by the labeler.
In `defender_only` mode (used for SFT, eval, smoke tests, and the `/demo` UI) the env auto-generates the incident from `tasks/registry.py` and skips straight to the defender turn.
## Why the reward cannot be hacked
1. The verifier is a transparent rule set in `verifier.compute_ground_truth(params)`; the *only* inputs are the structured events. The attacker's `narrative` and even its self-claimed `target_label` are ignored.
2. The plausibility checker (`verifier.check_plausibility(params)`) refuses incoherent stories β for example, a "data exfiltration" claim with a purely-internal destination, or a `lolbin_use` event with no `process` field. The attacker's reward is gated on plausibility passing.
3. Schema-violation incidents floor attacker reward at -0.5, so trying to short-circuit pydantic's validators is strictly worse than playing along.
The anti-hack invariants are pinned in [`tests/test_verifier.py`](tests/test_verifier.py) and [`tests/test_rubric.py`](tests/test_rubric.py).
## Action space and reward
Tool names are deliberately **non-reserved** β there is no `reset`/`step`/`state`/`close` clash with the OpenEnv `MCPEnvironment` reserved-name list.
```yaml
action_space:
craft_incident:
target_label: dismiss | monitor | quarantine_host | block_ip | escalate
category: malware_execution | c2_beacon | data_exfiltration | ...
events: [ { event_type, fields, timestamp, log_id }, ... ]
narrative: string # ignored by the verifier
submit_triage:
action: <one of the five triage actions>
cited_log_id: <id of the log line that drove the decision>
rationale: short string
```
- **Defender**: +1 correct, β1 missed-malicious, β0.3 over-react on benign, β0.05 unnecessary escalate, +0.1 bonus for citing the right triggering log id, β0.1 floor for format violation.
- **Attacker**: +1 iff defender wrong AND incident plausible, β0.5 if schema validation fails, +0.2 novelty bonus, 0 for gibberish.
Full breakdown: [openenv.yaml](openenv.yaml) and [rubric.py](rubric.py).
## Run locally
```bash
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python server.py # serves on :7860
```
Smoke test from another shell:
```bash
curl -s http://localhost:7860/health | jq .
curl -s -X POST 'http://localhost:7860/reset?task=stage1_basic&mode=defender_only' | jq .
curl -s -X POST 'http://localhost:7860/step?task=stage1_basic&mode=defender_only' \
-H 'content-type: application/json' \
-d '{"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "smoke"}}' | jq .
open http://localhost:7860/demo # Gradio before-vs-after UI
```
Run the test suite (CPU only, no GPU deps):
```bash
pytest -q # 93 passed
```
Or via the bundled Python client:
```python
from client import OpenSOCClient
c = OpenSOCClient()
obs = c.reset(task="stage1_basic", mode="defender_only", seed=1)
result = c.step({"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "ok"}},
task="stage1_basic", mode="defender_only", seed=1)
print(result)
```
## Run the training pipeline
Full end-to-end procedure: **[TRAIN.md](TRAIN.md)**. TL;DR β on an HF Jupyter L4 (~$3 of credits, ~3.5h wall time):
```bash
bash scripts/run_full_pipeline.sh
```
Or step-by-step inside [`train_grpo.ipynb`](train_grpo.ipynb):
1. SFT warm-start (~12 min) β pushes P(format-OK) from ~0% to ~95%.
2. GRPO curriculum across 4 stages (~3h) β verifier-grounded reward, group size 8.
3. Eval on the frozen 200-incident hold-out (~5 min).
4. `eval.plot_results` + `eval.plot_training` render four PNGs.
5. `eval.bake_demo` writes 50 before-vs-after pairs to `data/demo_examples.json` for the Gradio UI.
## Headline results
The defender model was trained using GRPO with a 4-stage curriculum on Qwen2.5-3B-Instruct with LoRA. All trained adapters are published on HuggingFace:
| Stage | Adapter | Difficulty |
| --- | --- | --- |
| SFT warm-start | [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) | Format learning |
| Stage 1 | [`opensoc-defender-grpo-stage1_basic`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) | Easy β single-event templates |
| Stage 2 | [`opensoc-defender-grpo-stage2_multi`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) | Medium β multi-event windows |
| Stage 3 | [`opensoc-defender-grpo-stage3_mixed`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) | Hard β benign decoys interleaved |
| Stage 4 | [`opensoc-defender-grpo-stage4_adversarial`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial) | Adversarial β attacker-controlled |
| Final | [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | Combined final adapter |
### Dismiss-on-malicious (the cardinal failure mode)

### Macro F1 across 200-incident hold-out

### Confusion matrices
| Baseline (always-dismiss) | Trained (verifier-oracle ceiling) |
| --- | --- |
|  |  |
### Reward across the curriculum

| Model | Accuracy | Macro F1 | Dismiss-on-malicious | Over-react |
| --- | ---: | ---: | ---: | ---: |
| `always_dismiss` (floor) | 0.13 | 0.05 | **1.00** | 0.00 |
| `verifier_oracle` (ceiling) | 1.00 | 1.00 | 0.00 | 0.00 |
## Deploy to Hugging Face Spaces
Full recipe: [DEPLOY.md](DEPLOY.md). The fast version, after `huggingface-cli login`:
```bash
export HF_USER=<your-username>
bash scripts/deploy_to_hf.sh
# Build takes ~5 minutes; then:
open https://${HF_USER}-opensoc-env.hf.space/demo
```
The Space runs FastAPI + Gradio in a single container. `/reset`, `/step`, `/state`, `/grade`, `/tasks`, `/health` continue to work for the OpenEnv judge bot; `/demo` is the human-readable UI.
## Repo map
| File / dir | Purpose |
| --- | --- |
| `openenv.yaml` | OpenEnv manifest (tasks, action space, reward range, endpoints) |
| `schema.py` | Incident / event / action schema with strict validators |
| `generator.py` | Materializes incidents for `defender_only` mode (eval, SFT) |
| `verifier.py` | Deterministic ground-truth labeler + plausibility checker |
| `rubric.py` | Layered defender + attacker reward functions |
| `env.py` | Two-role `OpenSOCEnv` (`reset` / `step` / `state` / `grade`) |
| `app_runtime.py` | FastAPI app exposing the OpenEnv API |
| `demo_app.py` | Gradio Blocks app mounted at `/demo` |
| `demo_data.py` | Pure-python helpers for the demo UI |
| `server.py` | Container entry point β imports `demo_app` then starts uvicorn |
| `tasks/registry.py` | Curriculum stages: `stage1_basic` β `stage4_adversarial` |
| `client/` | Thin HTTP client (server-internals-free) |
| `train/` | SFT warm-start + GRPO loop + reusable prompt format |
| `eval/` | Hold-out generator, metrics, eval driver, plot renderers, `bake_demo` |
| `scripts/run_full_pipeline.sh` | One-shot training + eval + bake-demo |
| `scripts/deploy_to_hf.sh` | One-shot HF Space push |
| `docs/` | Blog post, video script, slide deck builder |
| `tests/` | Pytest suite (93 tests, anti-hack regressions included) |
## Submission deliverables
Mapped to the four judging criteria:
| Criterion | Weight | Where it lives |
| --- | ---: | --- |
| Environment Innovation | 40% | `openenv.yaml`, `schema.py`, `verifier.py`, `env.py`, this README's *Architecture* and *Why the reward cannot be hacked* sections |
| Storytelling & Presentation | 30% | `/demo` Gradio UI + 90s video + HF blog |
| Showing Improvement in Rewards | 20% | `eval/results/*.png` (training curves + confusion + headline bar) embedded above |
| Reward & Training Pipeline | 10% | `rubric.py` + 93-test anti-hack suite + `train_grpo.ipynb` + `scripts/run_full_pipeline.sh` |
Submission checklist:
- [x] OpenEnv-compatible env (gym-style API, manifest, non-reserved tool names)
- [x] Deterministic RLVR verifier + plausibility checker
- [x] Layered defender + attacker reward
- [x] SFT warm-start dataset (committed)
- [x] Frozen 200-incident hold-out (committed)
- [x] GRPO curriculum notebook + one-shot training script
- [x] Eval harness + plotters
- [x] Pytest suite (93 tests, anti-hack regressions included)
- [x] Gradio `/demo` UI mounted on the same Space (free-CPU-tier compatible)
- [x] Blog post (`docs/blog.md`)
- [x] HF Space pushed and **running**: [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env)
- [x] SFT adapter trained and pushed: [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft)
- [x] GRPO adapters trained and pushed (4 stages): [`stage1`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) [`stage2`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) [`stage3`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) [`stage4`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial)
- [x] Final adapter pushed: [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo)
## License
BSD-3-Clause.
|