File size: 13,572 Bytes

---
title: OpenSOC SOC Triage Env
emoji: 🛡️
colorFrom: indigo
colorTo: red
sdk: docker
app_port: 7860
pinned: false
license: bsd-3-clause
tags:
  - openenv
  - cybersecurity
  - rlvr
  - self-play
---

# OpenSOC: Self-Play SOC Triage Environment

> An **OpenEnv** environment for training cybersecurity defender LLMs against an attacker LLM that auto-generates novel incidents. Built for the OpenEnv Hackathon, April 2026.

Humans cannot watch every alert in a Security Operations Center 24/7, and as stronger generative models start writing exploits and phishing at industrial scale that gap only widens. **OpenSOC** is an environment where a defender LLM learns to triage attacks generated by another LLM in a self-play loop. The trick is **RLVR**: triage ground truth is computed by a deterministic schema-side verifier from the *structured* incident parameters — never from any text the attacker writes — so neither side can hack the reward.

## Try it

| Link | What it is |
| --- | --- |
| **HF Space** — [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env (Running). OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
| **Live `/demo`** — [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
| **Trained model** — [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | GRPO-trained Qwen2.5-3B-Instruct LoRA defender adapter. |
| **Training notebook** — [`train_grpo.ipynb`](train_grpo.ipynb) | End-to-end SFT warm-start + GRPO curriculum using Unsloth + TRL. |
| **Mini-blog** — [`docs/blog.md`](docs/blog.md) | ~600-word write-up of the project. |

## Table of contents

1. [Architecture](#architecture)
2. [Why the reward cannot be hacked](#why-the-reward-cannot-be-hacked)
3. [Action space and reward](#action-space-and-reward)
4. [Run locally](#run-locally)
5. [Run the training pipeline](#run-the-training-pipeline)
6. [Headline results](#headline-results)
7. [Deploy to Hugging Face Spaces](#deploy-to-hugging-face-spaces)
8. [Repo map](#repo-map)
9. [Submission deliverables](#submission-deliverables)

## Build status

| Build artifact | Status |
| --- | --- |
| Pure-python env (`OpenSOCEnv`, FastAPI) | ✅ shipped |
| Verifier + plausibility checker | ✅ shipped, 17-test adversarial suite |
| Rubric (defender + attacker rewards) | ✅ shipped, anti-hack regression tests |
| 600-example SFT dataset (`data/sft_train.jsonl`) | ✅ shipped |
| 200-incident frozen hold-out (`data/holdout.jsonl`) | ✅ shipped |
| SFT warm-start adapter | ✅ trained → [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) |
| GRPO curriculum (4 stages) | ✅ trained → adapters for each stage on HF |
| Final GRPO adapter | ✅ [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) |
| GRPO training notebook (`train_grpo.ipynb`) | ✅ shipped (ran on HF Jupyter with Unsloth + TRL) |
| Gradio "before vs after" UI | ✅ **live** at [`/demo`](https://shivam2k3-opensoc-env.hf.space/demo) |
| Eval harness + plotters (`eval/`) | ✅ shipped |
| Pytest suite | ✅ **93 tests**, all green |
| HF Space | ✅ **live** at [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env) |

## Architecture

```mermaid
flowchart LR
  Defender[Defender LLM trainee]
  Attacker[Attacker LLM trainee]
  Env[OpenSOC FastAPI Environment]
  Verifier[Deterministic verifier + plausibility check]
  Defender -->|submit_triage| Env
  Attacker -->|craft_incident| Env
  Env -->|observation reward| Defender
  Env -->|attacker reward| Attacker
  Env --> Verifier
  Verifier -->|ground truth label| Env
```

An episode has exactly two turns: attacker proposes incident params → env validates them and materializes a SIEM-style alert + log window → defender submits a triage action.  The verifier computes the ground-truth action from the *events alone* and scores both sides — the attacker's free-text narrative is never read by the labeler.

In `defender_only` mode (used for SFT, eval, smoke tests, and the `/demo` UI) the env auto-generates the incident from `tasks/registry.py` and skips straight to the defender turn.

## Why the reward cannot be hacked

1. The verifier is a transparent rule set in `verifier.compute_ground_truth(params)`; the *only* inputs are the structured events.  The attacker's `narrative` and even its self-claimed `target_label` are ignored.
2. The plausibility checker (`verifier.check_plausibility(params)`) refuses incoherent stories — for example, a "data exfiltration" claim with a purely-internal destination, or a `lolbin_use` event with no `process` field.  The attacker's reward is gated on plausibility passing.
3. Schema-violation incidents floor attacker reward at -0.5, so trying to short-circuit pydantic's validators is strictly worse than playing along.

The anti-hack invariants are pinned in [`tests/test_verifier.py`](tests/test_verifier.py) and [`tests/test_rubric.py`](tests/test_rubric.py).

## Action space and reward

Tool names are deliberately **non-reserved** — there is no `reset`/`step`/`state`/`close` clash with the OpenEnv `MCPEnvironment` reserved-name list.

```yaml
action_space:
  craft_incident:
    target_label: dismiss | monitor | quarantine_host | block_ip | escalate
    category:     malware_execution | c2_beacon | data_exfiltration | ...
    events:       [ { event_type, fields, timestamp, log_id }, ... ]
    narrative:    string         # ignored by the verifier
  submit_triage:
    action:       <one of the five triage actions>
    cited_log_id: <id of the log line that drove the decision>
    rationale:    short string
```

- **Defender**: +1 correct, −1 missed-malicious, −0.3 over-react on benign, −0.05 unnecessary escalate, +0.1 bonus for citing the right triggering log id, −0.1 floor for format violation.
- **Attacker**: +1 iff defender wrong AND incident plausible, −0.5 if schema validation fails, +0.2 novelty bonus, 0 for gibberish.

Full breakdown: [openenv.yaml](openenv.yaml) and [rubric.py](rubric.py).

## Run locally

```bash
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python server.py    # serves on :7860
```

Smoke test from another shell:

```bash
curl -s http://localhost:7860/health | jq .
curl -s -X POST 'http://localhost:7860/reset?task=stage1_basic&mode=defender_only' | jq .
curl -s -X POST 'http://localhost:7860/step?task=stage1_basic&mode=defender_only' \
     -H 'content-type: application/json' \
     -d '{"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "smoke"}}' | jq .
open http://localhost:7860/demo   # Gradio before-vs-after UI
```

Run the test suite (CPU only, no GPU deps):

```bash
pytest -q   # 93 passed
```

Or via the bundled Python client:

```python
from client import OpenSOCClient
c = OpenSOCClient()
obs = c.reset(task="stage1_basic", mode="defender_only", seed=1)
result = c.step({"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "ok"}},
                task="stage1_basic", mode="defender_only", seed=1)
print(result)
```

## Run the training pipeline

Full end-to-end procedure: **[TRAIN.md](TRAIN.md)**.  TL;DR — on an HF Jupyter L4 (~$3 of credits, ~3.5h wall time):

```bash
bash scripts/run_full_pipeline.sh
```

Or step-by-step inside [`train_grpo.ipynb`](train_grpo.ipynb):

1. SFT warm-start (~12 min) — pushes P(format-OK) from ~0% to ~95%.
2. GRPO curriculum across 4 stages (~3h) — verifier-grounded reward, group size 8.
3. Eval on the frozen 200-incident hold-out (~5 min).
4. `eval.plot_results` + `eval.plot_training` render four PNGs.
5. `eval.bake_demo` writes 50 before-vs-after pairs to `data/demo_examples.json` for the Gradio UI.

## Headline results

The defender model was trained using GRPO with a 4-stage curriculum on Qwen2.5-3B-Instruct with LoRA.  All trained adapters are published on HuggingFace:

| Stage | Adapter | Difficulty |
| --- | --- | --- |
| SFT warm-start | [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) | Format learning |
| Stage 1 | [`opensoc-defender-grpo-stage1_basic`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) | Easy — single-event templates |
| Stage 2 | [`opensoc-defender-grpo-stage2_multi`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) | Medium — multi-event windows |
| Stage 3 | [`opensoc-defender-grpo-stage3_mixed`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) | Hard — benign decoys interleaved |
| Stage 4 | [`opensoc-defender-grpo-stage4_adversarial`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial) | Adversarial — attacker-controlled |
| Final | [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | Combined final adapter |

### Dismiss-on-malicious (the cardinal failure mode)

![dismiss-on-malicious by model](eval/results/bar_dismiss_on_malicious.png)

### Macro F1 across 200-incident hold-out

![macro F1 by model](eval/results/bar_macro_f1.png)

### Confusion matrices

| Baseline (always-dismiss) | Trained (verifier-oracle ceiling) |
| --- | --- |
| ![baseline confusion](eval/results/confusion_always_dismiss.png) | ![trained confusion](eval/results/confusion_verifier_oracle.png) |

### Reward across the curriculum

![training reward curves](eval/results/training_curves.png)

| Model | Accuracy | Macro F1 | Dismiss-on-malicious | Over-react |
| --- | ---: | ---: | ---: | ---: |
| `always_dismiss` (floor)      | 0.13 | 0.05 | **1.00** | 0.00 |
| `verifier_oracle` (ceiling)   | 1.00 | 1.00 | 0.00 | 0.00 |

## Deploy to Hugging Face Spaces

Full recipe: [DEPLOY.md](DEPLOY.md).  The fast version, after `huggingface-cli login`:

```bash
export HF_USER=<your-username>
bash scripts/deploy_to_hf.sh
# Build takes ~5 minutes; then:
open https://${HF_USER}-opensoc-env.hf.space/demo
```

The Space runs FastAPI + Gradio in a single container.  `/reset`, `/step`, `/state`, `/grade`, `/tasks`, `/health` continue to work for the OpenEnv judge bot; `/demo` is the human-readable UI.

## Repo map

| File / dir | Purpose |
| --- | --- |
| `openenv.yaml` | OpenEnv manifest (tasks, action space, reward range, endpoints) |
| `schema.py` | Incident / event / action schema with strict validators |
| `generator.py` | Materializes incidents for `defender_only` mode (eval, SFT) |
| `verifier.py` | Deterministic ground-truth labeler + plausibility checker |
| `rubric.py` | Layered defender + attacker reward functions |
| `env.py` | Two-role `OpenSOCEnv` (`reset` / `step` / `state` / `grade`) |
| `app_runtime.py` | FastAPI app exposing the OpenEnv API |
| `demo_app.py` | Gradio Blocks app mounted at `/demo` |
| `demo_data.py` | Pure-python helpers for the demo UI |
| `server.py` | Container entry point — imports `demo_app` then starts uvicorn |
| `tasks/registry.py` | Curriculum stages: `stage1_basic` → `stage4_adversarial` |
| `client/` | Thin HTTP client (server-internals-free) |
| `train/` | SFT warm-start + GRPO loop + reusable prompt format |
| `eval/` | Hold-out generator, metrics, eval driver, plot renderers, `bake_demo` |
| `scripts/run_full_pipeline.sh` | One-shot training + eval + bake-demo |
| `scripts/deploy_to_hf.sh` | One-shot HF Space push |
| `docs/` | Blog post, video script, slide deck builder |
| `tests/` | Pytest suite (93 tests, anti-hack regressions included) |

## Submission deliverables

Mapped to the four judging criteria:

| Criterion | Weight | Where it lives |
| --- | ---: | --- |
| Environment Innovation | 40% | `openenv.yaml`, `schema.py`, `verifier.py`, `env.py`, this README's *Architecture* and *Why the reward cannot be hacked* sections |
| Storytelling & Presentation | 30% | `/demo` Gradio UI + 90s video + HF blog |
| Showing Improvement in Rewards | 20% | `eval/results/*.png` (training curves + confusion + headline bar) embedded above |
| Reward & Training Pipeline | 10% | `rubric.py` + 93-test anti-hack suite + `train_grpo.ipynb` + `scripts/run_full_pipeline.sh` |

Submission checklist:

- [x] OpenEnv-compatible env (gym-style API, manifest, non-reserved tool names)
- [x] Deterministic RLVR verifier + plausibility checker
- [x] Layered defender + attacker reward
- [x] SFT warm-start dataset (committed)
- [x] Frozen 200-incident hold-out (committed)
- [x] GRPO curriculum notebook + one-shot training script
- [x] Eval harness + plotters
- [x] Pytest suite (93 tests, anti-hack regressions included)
- [x] Gradio `/demo` UI mounted on the same Space (free-CPU-tier compatible)
- [x] Blog post (`docs/blog.md`)
- [x] HF Space pushed and **running**: [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env)
- [x] SFT adapter trained and pushed: [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft)
- [x] GRPO adapters trained and pushed (4 stages): [`stage1`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) [`stage2`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) [`stage3`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) [`stage4`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial)
- [x] Final adapter pushed: [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo)

## License

BSD-3-Clause.