--- title: OpenSOC SOC Triage Env emoji: 🛡️ colorFrom: indigo colorTo: red sdk: docker app_port: 7860 pinned: false license: bsd-3-clause tags: - openenv - cybersecurity - rlvr - self-play --- # OpenSOC: Self-Play SOC Triage Environment > An **OpenEnv** environment for training cybersecurity defender LLMs against an attacker LLM that auto-generates novel incidents. Built for the OpenEnv Hackathon, April 2026. Humans cannot watch every alert in a Security Operations Center 24/7, and as stronger generative models start writing exploits and phishing at industrial scale that gap only widens. **OpenSOC** is an environment where a defender LLM learns to triage attacks generated by another LLM in a self-play loop. The trick is **RLVR**: triage ground truth is computed by a deterministic schema-side verifier from the *structured* incident parameters — never from any text the attacker writes — so neither side can hack the reward. ## Try it | Link | What it is | | --- | --- | | **HF Space** — [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env (Running). OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. | | **Live `/demo`** — [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. | | **Trained model** — [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | GRPO-trained Qwen2.5-3B-Instruct LoRA defender adapter. | | **Training notebook** — [`train_grpo.ipynb`](train_grpo.ipynb) | End-to-end SFT warm-start + GRPO curriculum using Unsloth + TRL. | | **Mini-blog** — [`docs/blog.md`](docs/blog.md) | ~600-word write-up of the project. | ## Table of contents 1. [Architecture](#architecture) 2. [Why the reward cannot be hacked](#why-the-reward-cannot-be-hacked) 3. [Action space and reward](#action-space-and-reward) 4. [Run locally](#run-locally) 5. [Run the training pipeline](#run-the-training-pipeline) 6. [Headline results](#headline-results) 7. [Deploy to Hugging Face Spaces](#deploy-to-hugging-face-spaces) 8. [Repo map](#repo-map) 9. [Submission deliverables](#submission-deliverables) ## Build status | Build artifact | Status | | --- | --- | | Pure-python env (`OpenSOCEnv`, FastAPI) | ✅ shipped | | Verifier + plausibility checker | ✅ shipped, 17-test adversarial suite | | Rubric (defender + attacker rewards) | ✅ shipped, anti-hack regression tests | | 600-example SFT dataset (`data/sft_train.jsonl`) | ✅ shipped | | 200-incident frozen hold-out (`data/holdout.jsonl`) | ✅ shipped | | SFT warm-start adapter | ✅ trained → [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) | | GRPO curriculum (4 stages) | ✅ trained → adapters for each stage on HF | | Final GRPO adapter | ✅ [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | | GRPO training notebook (`train_grpo.ipynb`) | ✅ shipped (ran on HF Jupyter with Unsloth + TRL) | | Gradio "before vs after" UI | ✅ **live** at [`/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | | Eval harness + plotters (`eval/`) | ✅ shipped | | Pytest suite | ✅ **93 tests**, all green | | HF Space | ✅ **live** at [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | ## Architecture ```mermaid flowchart LR Defender[Defender LLM trainee] Attacker[Attacker LLM trainee] Env[OpenSOC FastAPI Environment] Verifier[Deterministic verifier + plausibility check] Defender -->|submit_triage| Env Attacker -->|craft_incident| Env Env -->|observation reward| Defender Env -->|attacker reward| Attacker Env --> Verifier Verifier -->|ground truth label| Env ``` An episode has exactly two turns: attacker proposes incident params → env validates them and materializes a SIEM-style alert + log window → defender submits a triage action. The verifier computes the ground-truth action from the *events alone* and scores both sides — the attacker's free-text narrative is never read by the labeler. In `defender_only` mode (used for SFT, eval, smoke tests, and the `/demo` UI) the env auto-generates the incident from `tasks/registry.py` and skips straight to the defender turn. ## Why the reward cannot be hacked 1. The verifier is a transparent rule set in `verifier.compute_ground_truth(params)`; the *only* inputs are the structured events. The attacker's `narrative` and even its self-claimed `target_label` are ignored. 2. The plausibility checker (`verifier.check_plausibility(params)`) refuses incoherent stories — for example, a "data exfiltration" claim with a purely-internal destination, or a `lolbin_use` event with no `process` field. The attacker's reward is gated on plausibility passing. 3. Schema-violation incidents floor attacker reward at -0.5, so trying to short-circuit pydantic's validators is strictly worse than playing along. The anti-hack invariants are pinned in [`tests/test_verifier.py`](tests/test_verifier.py) and [`tests/test_rubric.py`](tests/test_rubric.py). ## Action space and reward Tool names are deliberately **non-reserved** — there is no `reset`/`step`/`state`/`close` clash with the OpenEnv `MCPEnvironment` reserved-name list. ```yaml action_space: craft_incident: target_label: dismiss | monitor | quarantine_host | block_ip | escalate category: malware_execution | c2_beacon | data_exfiltration | ... events: [ { event_type, fields, timestamp, log_id }, ... ] narrative: string # ignored by the verifier submit_triage: action: cited_log_id: rationale: short string ``` - **Defender**: +1 correct, −1 missed-malicious, −0.3 over-react on benign, −0.05 unnecessary escalate, +0.1 bonus for citing the right triggering log id, −0.1 floor for format violation. - **Attacker**: +1 iff defender wrong AND incident plausible, −0.5 if schema validation fails, +0.2 novelty bonus, 0 for gibberish. Full breakdown: [openenv.yaml](openenv.yaml) and [rubric.py](rubric.py). ## Run locally ```bash python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt python server.py # serves on :7860 ``` Smoke test from another shell: ```bash curl -s http://localhost:7860/health | jq . curl -s -X POST 'http://localhost:7860/reset?task=stage1_basic&mode=defender_only' | jq . curl -s -X POST 'http://localhost:7860/step?task=stage1_basic&mode=defender_only' \ -H 'content-type: application/json' \ -d '{"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "smoke"}}' | jq . open http://localhost:7860/demo # Gradio before-vs-after UI ``` Run the test suite (CPU only, no GPU deps): ```bash pytest -q # 93 passed ``` Or via the bundled Python client: ```python from client import OpenSOCClient c = OpenSOCClient() obs = c.reset(task="stage1_basic", mode="defender_only", seed=1) result = c.step({"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "ok"}}, task="stage1_basic", mode="defender_only", seed=1) print(result) ``` ## Run the training pipeline Full end-to-end procedure: **[TRAIN.md](TRAIN.md)**. TL;DR — on an HF Jupyter L4 (~$3 of credits, ~3.5h wall time): ```bash bash scripts/run_full_pipeline.sh ``` Or step-by-step inside [`train_grpo.ipynb`](train_grpo.ipynb): 1. SFT warm-start (~12 min) — pushes P(format-OK) from ~0% to ~95%. 2. GRPO curriculum across 4 stages (~3h) — verifier-grounded reward, group size 8. 3. Eval on the frozen 200-incident hold-out (~5 min). 4. `eval.plot_results` + `eval.plot_training` render four PNGs. 5. `eval.bake_demo` writes 50 before-vs-after pairs to `data/demo_examples.json` for the Gradio UI. ## Headline results The defender model was trained using GRPO with a 4-stage curriculum on Qwen2.5-3B-Instruct with LoRA. All trained adapters are published on HuggingFace: | Stage | Adapter | Difficulty | | --- | --- | --- | | SFT warm-start | [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) | Format learning | | Stage 1 | [`opensoc-defender-grpo-stage1_basic`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) | Easy — single-event templates | | Stage 2 | [`opensoc-defender-grpo-stage2_multi`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) | Medium — multi-event windows | | Stage 3 | [`opensoc-defender-grpo-stage3_mixed`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) | Hard — benign decoys interleaved | | Stage 4 | [`opensoc-defender-grpo-stage4_adversarial`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial) | Adversarial — attacker-controlled | | Final | [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | Combined final adapter | ### Dismiss-on-malicious (the cardinal failure mode) ![dismiss-on-malicious by model](eval/results/bar_dismiss_on_malicious.png) ### Macro F1 across 200-incident hold-out ![macro F1 by model](eval/results/bar_macro_f1.png) ### Confusion matrices | Baseline (always-dismiss) | Trained (verifier-oracle ceiling) | | --- | --- | | ![baseline confusion](eval/results/confusion_always_dismiss.png) | ![trained confusion](eval/results/confusion_verifier_oracle.png) | ### Reward across the curriculum ![training reward curves](eval/results/training_curves.png) | Model | Accuracy | Macro F1 | Dismiss-on-malicious | Over-react | | --- | ---: | ---: | ---: | ---: | | `always_dismiss` (floor) | 0.13 | 0.05 | **1.00** | 0.00 | | `verifier_oracle` (ceiling) | 1.00 | 1.00 | 0.00 | 0.00 | ## Deploy to Hugging Face Spaces Full recipe: [DEPLOY.md](DEPLOY.md). The fast version, after `huggingface-cli login`: ```bash export HF_USER= bash scripts/deploy_to_hf.sh # Build takes ~5 minutes; then: open https://${HF_USER}-opensoc-env.hf.space/demo ``` The Space runs FastAPI + Gradio in a single container. `/reset`, `/step`, `/state`, `/grade`, `/tasks`, `/health` continue to work for the OpenEnv judge bot; `/demo` is the human-readable UI. ## Repo map | File / dir | Purpose | | --- | --- | | `openenv.yaml` | OpenEnv manifest (tasks, action space, reward range, endpoints) | | `schema.py` | Incident / event / action schema with strict validators | | `generator.py` | Materializes incidents for `defender_only` mode (eval, SFT) | | `verifier.py` | Deterministic ground-truth labeler + plausibility checker | | `rubric.py` | Layered defender + attacker reward functions | | `env.py` | Two-role `OpenSOCEnv` (`reset` / `step` / `state` / `grade`) | | `app_runtime.py` | FastAPI app exposing the OpenEnv API | | `demo_app.py` | Gradio Blocks app mounted at `/demo` | | `demo_data.py` | Pure-python helpers for the demo UI | | `server.py` | Container entry point — imports `demo_app` then starts uvicorn | | `tasks/registry.py` | Curriculum stages: `stage1_basic` → `stage4_adversarial` | | `client/` | Thin HTTP client (server-internals-free) | | `train/` | SFT warm-start + GRPO loop + reusable prompt format | | `eval/` | Hold-out generator, metrics, eval driver, plot renderers, `bake_demo` | | `scripts/run_full_pipeline.sh` | One-shot training + eval + bake-demo | | `scripts/deploy_to_hf.sh` | One-shot HF Space push | | `docs/` | Blog post, video script, slide deck builder | | `tests/` | Pytest suite (93 tests, anti-hack regressions included) | ## Submission deliverables Mapped to the four judging criteria: | Criterion | Weight | Where it lives | | --- | ---: | --- | | Environment Innovation | 40% | `openenv.yaml`, `schema.py`, `verifier.py`, `env.py`, this README's *Architecture* and *Why the reward cannot be hacked* sections | | Storytelling & Presentation | 30% | `/demo` Gradio UI + 90s video + HF blog | | Showing Improvement in Rewards | 20% | `eval/results/*.png` (training curves + confusion + headline bar) embedded above | | Reward & Training Pipeline | 10% | `rubric.py` + 93-test anti-hack suite + `train_grpo.ipynb` + `scripts/run_full_pipeline.sh` | Submission checklist: - [x] OpenEnv-compatible env (gym-style API, manifest, non-reserved tool names) - [x] Deterministic RLVR verifier + plausibility checker - [x] Layered defender + attacker reward - [x] SFT warm-start dataset (committed) - [x] Frozen 200-incident hold-out (committed) - [x] GRPO curriculum notebook + one-shot training script - [x] Eval harness + plotters - [x] Pytest suite (93 tests, anti-hack regressions included) - [x] Gradio `/demo` UI mounted on the same Space (free-CPU-tier compatible) - [x] Blog post (`docs/blog.md`) - [x] HF Space pushed and **running**: [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env) - [x] SFT adapter trained and pushed: [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) - [x] GRPO adapters trained and pushed (4 stages): [`stage1`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) [`stage2`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) [`stage3`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) [`stage4`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial) - [x] Final adapter pushed: [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) ## License BSD-3-Clause.