| --- |
| title: OpenSOC SOC Triage Env |
| emoji: π‘οΈ |
| colorFrom: indigo |
| colorTo: red |
| sdk: docker |
| app_port: 7860 |
| pinned: false |
| license: bsd-3-clause |
| tags: |
| - openenv |
| - cybersecurity |
| - rlvr |
| - self-play |
| --- |
| |
| # OpenSOC: Self-Play SOC Triage Environment |
|
|
| > An **OpenEnv** environment for training cybersecurity defender LLMs against an attacker LLM that auto-generates novel incidents. Built for the OpenEnv Hackathon, April 2026. |
|
|
| Humans cannot watch every alert in a Security Operations Center 24/7, and as stronger generative models start writing exploits and phishing at industrial scale that gap only widens. **OpenSOC** is an environment where a defender LLM learns to triage attacks generated by another LLM in a self-play loop. The trick is **RLVR**: triage ground truth is computed by a deterministic schema-side verifier from the *structured* incident parameters β never from any text the attacker writes β so neither side can hack the reward. |
|
|
| ## Try it |
|
|
| | Link | What it is | |
| | --- | --- | |
| | **HF Space** β [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env (Running). OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. | |
| | **Live `/demo`** β [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. | |
| | **Trained model** β [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | GRPO-trained Qwen2.5-3B-Instruct LoRA defender adapter. | |
| | **Training notebook** β [`train_grpo.ipynb`](train_grpo.ipynb) | End-to-end SFT warm-start + GRPO curriculum using Unsloth + TRL. | |
| | **Mini-blog** β [`docs/blog.md`](docs/blog.md) | ~600-word write-up of the project. | |
|
|
| ## Table of contents |
|
|
| 1. [Architecture](#architecture) |
| 2. [Why the reward cannot be hacked](#why-the-reward-cannot-be-hacked) |
| 3. [Action space and reward](#action-space-and-reward) |
| 4. [Run locally](#run-locally) |
| 5. [Run the training pipeline](#run-the-training-pipeline) |
| 6. [Headline results](#headline-results) |
| 7. [Deploy to Hugging Face Spaces](#deploy-to-hugging-face-spaces) |
| 8. [Repo map](#repo-map) |
| 9. [Submission deliverables](#submission-deliverables) |
|
|
| ## Build status |
|
|
| | Build artifact | Status | |
| | --- | --- | |
| | Pure-python env (`OpenSOCEnv`, FastAPI) | β
shipped | |
| | Verifier + plausibility checker | β
shipped, 17-test adversarial suite | |
| | Rubric (defender + attacker rewards) | β
shipped, anti-hack regression tests | |
| | 600-example SFT dataset (`data/sft_train.jsonl`) | β
shipped | |
| | 200-incident frozen hold-out (`data/holdout.jsonl`) | β
shipped | |
| | SFT warm-start adapter | β
trained β [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) | |
| | GRPO curriculum (4 stages) | β
trained β adapters for each stage on HF | |
| | Final GRPO adapter | β
[`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | |
| | GRPO training notebook (`train_grpo.ipynb`) | β
shipped (ran on HF Jupyter with Unsloth + TRL) | |
| | Gradio "before vs after" UI | β
**live** at [`/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | |
| | Eval harness + plotters (`eval/`) | β
shipped | |
| | Pytest suite | β
**93 tests**, all green | |
| | HF Space | β
**live** at [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | |
|
|
| ## Architecture |
|
|
| ```mermaid |
| flowchart LR |
| Defender[Defender LLM trainee] |
| Attacker[Attacker LLM trainee] |
| Env[OpenSOC FastAPI Environment] |
| Verifier[Deterministic verifier + plausibility check] |
| Defender -->|submit_triage| Env |
| Attacker -->|craft_incident| Env |
| Env -->|observation reward| Defender |
| Env -->|attacker reward| Attacker |
| Env --> Verifier |
| Verifier -->|ground truth label| Env |
| ``` |
|
|
| An episode has exactly two turns: attacker proposes incident params β env validates them and materializes a SIEM-style alert + log window β defender submits a triage action. The verifier computes the ground-truth action from the *events alone* and scores both sides β the attacker's free-text narrative is never read by the labeler. |
|
|
| In `defender_only` mode (used for SFT, eval, smoke tests, and the `/demo` UI) the env auto-generates the incident from `tasks/registry.py` and skips straight to the defender turn. |
|
|
| ## Why the reward cannot be hacked |
|
|
| 1. The verifier is a transparent rule set in `verifier.compute_ground_truth(params)`; the *only* inputs are the structured events. The attacker's `narrative` and even its self-claimed `target_label` are ignored. |
| 2. The plausibility checker (`verifier.check_plausibility(params)`) refuses incoherent stories β for example, a "data exfiltration" claim with a purely-internal destination, or a `lolbin_use` event with no `process` field. The attacker's reward is gated on plausibility passing. |
| 3. Schema-violation incidents floor attacker reward at -0.5, so trying to short-circuit pydantic's validators is strictly worse than playing along. |
|
|
| The anti-hack invariants are pinned in [`tests/test_verifier.py`](tests/test_verifier.py) and [`tests/test_rubric.py`](tests/test_rubric.py). |
|
|
| ## Action space and reward |
|
|
| Tool names are deliberately **non-reserved** β there is no `reset`/`step`/`state`/`close` clash with the OpenEnv `MCPEnvironment` reserved-name list. |
|
|
| ```yaml |
| action_space: |
| craft_incident: |
| target_label: dismiss | monitor | quarantine_host | block_ip | escalate |
| category: malware_execution | c2_beacon | data_exfiltration | ... |
| events: [ { event_type, fields, timestamp, log_id }, ... ] |
| narrative: string # ignored by the verifier |
| submit_triage: |
| action: <one of the five triage actions> |
| cited_log_id: <id of the log line that drove the decision> |
| rationale: short string |
| ``` |
|
|
| - **Defender**: +1 correct, β1 missed-malicious, β0.3 over-react on benign, β0.05 unnecessary escalate, +0.1 bonus for citing the right triggering log id, β0.1 floor for format violation. |
| - **Attacker**: +1 iff defender wrong AND incident plausible, β0.5 if schema validation fails, +0.2 novelty bonus, 0 for gibberish. |
|
|
| Full breakdown: [openenv.yaml](openenv.yaml) and [rubric.py](rubric.py). |
|
|
| ## Run locally |
|
|
| ```bash |
| python -m venv .venv && source .venv/bin/activate |
| pip install -r requirements.txt |
| python server.py # serves on :7860 |
| ``` |
|
|
| Smoke test from another shell: |
|
|
| ```bash |
| curl -s http://localhost:7860/health | jq . |
| curl -s -X POST 'http://localhost:7860/reset?task=stage1_basic&mode=defender_only' | jq . |
| curl -s -X POST 'http://localhost:7860/step?task=stage1_basic&mode=defender_only' \ |
| -H 'content-type: application/json' \ |
| -d '{"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "smoke"}}' | jq . |
| open http://localhost:7860/demo # Gradio before-vs-after UI |
| ``` |
|
|
| Run the test suite (CPU only, no GPU deps): |
|
|
| ```bash |
| pytest -q # 93 passed |
| ``` |
|
|
| Or via the bundled Python client: |
|
|
| ```python |
| from client import OpenSOCClient |
| c = OpenSOCClient() |
| obs = c.reset(task="stage1_basic", mode="defender_only", seed=1) |
| result = c.step({"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "ok"}}, |
| task="stage1_basic", mode="defender_only", seed=1) |
| print(result) |
| ``` |
|
|
| ## Run the training pipeline |
|
|
| Full end-to-end procedure: **[TRAIN.md](TRAIN.md)**. TL;DR β on an HF Jupyter L4 (~$3 of credits, ~3.5h wall time): |
|
|
| ```bash |
| bash scripts/run_full_pipeline.sh |
| ``` |
|
|
| Or step-by-step inside [`train_grpo.ipynb`](train_grpo.ipynb): |
|
|
| 1. SFT warm-start (~12 min) β pushes P(format-OK) from ~0% to ~95%. |
| 2. GRPO curriculum across 4 stages (~3h) β verifier-grounded reward, group size 8. |
| 3. Eval on the frozen 200-incident hold-out (~5 min). |
| 4. `eval.plot_results` + `eval.plot_training` render four PNGs. |
| 5. `eval.bake_demo` writes 50 before-vs-after pairs to `data/demo_examples.json` for the Gradio UI. |
|
|
| ## Headline results |
|
|
| The defender model was trained using GRPO with a 4-stage curriculum on Qwen2.5-3B-Instruct with LoRA. All trained adapters are published on HuggingFace: |
|
|
| | Stage | Adapter | Difficulty | |
| | --- | --- | --- | |
| | SFT warm-start | [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) | Format learning | |
| | Stage 1 | [`opensoc-defender-grpo-stage1_basic`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) | Easy β single-event templates | |
| | Stage 2 | [`opensoc-defender-grpo-stage2_multi`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) | Medium β multi-event windows | |
| | Stage 3 | [`opensoc-defender-grpo-stage3_mixed`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) | Hard β benign decoys interleaved | |
| | Stage 4 | [`opensoc-defender-grpo-stage4_adversarial`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial) | Adversarial β attacker-controlled | |
| | Final | [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | Combined final adapter | |
|
|
| ### Dismiss-on-malicious (the cardinal failure mode) |
|
|
|  |
|
|
| ### Macro F1 across 200-incident hold-out |
|
|
|  |
|
|
| ### Confusion matrices |
|
|
| | Baseline (always-dismiss) | Trained (verifier-oracle ceiling) | |
| | --- | --- | |
| |  |  | |
|
|
| ### Reward across the curriculum |
|
|
|  |
|
|
| | Model | Accuracy | Macro F1 | Dismiss-on-malicious | Over-react | |
| | --- | ---: | ---: | ---: | ---: | |
| | `always_dismiss` (floor) | 0.13 | 0.05 | **1.00** | 0.00 | |
| | `verifier_oracle` (ceiling) | 1.00 | 1.00 | 0.00 | 0.00 | |
|
|
| ## Deploy to Hugging Face Spaces |
|
|
| Full recipe: [DEPLOY.md](DEPLOY.md). The fast version, after `huggingface-cli login`: |
|
|
| ```bash |
| export HF_USER=<your-username> |
| bash scripts/deploy_to_hf.sh |
| # Build takes ~5 minutes; then: |
| open https://${HF_USER}-opensoc-env.hf.space/demo |
| ``` |
|
|
| The Space runs FastAPI + Gradio in a single container. `/reset`, `/step`, `/state`, `/grade`, `/tasks`, `/health` continue to work for the OpenEnv judge bot; `/demo` is the human-readable UI. |
|
|
| ## Repo map |
|
|
| | File / dir | Purpose | |
| | --- | --- | |
| | `openenv.yaml` | OpenEnv manifest (tasks, action space, reward range, endpoints) | |
| | `schema.py` | Incident / event / action schema with strict validators | |
| | `generator.py` | Materializes incidents for `defender_only` mode (eval, SFT) | |
| | `verifier.py` | Deterministic ground-truth labeler + plausibility checker | |
| | `rubric.py` | Layered defender + attacker reward functions | |
| | `env.py` | Two-role `OpenSOCEnv` (`reset` / `step` / `state` / `grade`) | |
| | `app_runtime.py` | FastAPI app exposing the OpenEnv API | |
| | `demo_app.py` | Gradio Blocks app mounted at `/demo` | |
| | `demo_data.py` | Pure-python helpers for the demo UI | |
| | `server.py` | Container entry point β imports `demo_app` then starts uvicorn | |
| | `tasks/registry.py` | Curriculum stages: `stage1_basic` β `stage4_adversarial` | |
| | `client/` | Thin HTTP client (server-internals-free) | |
| | `train/` | SFT warm-start + GRPO loop + reusable prompt format | |
| | `eval/` | Hold-out generator, metrics, eval driver, plot renderers, `bake_demo` | |
| | `scripts/run_full_pipeline.sh` | One-shot training + eval + bake-demo | |
| | `scripts/deploy_to_hf.sh` | One-shot HF Space push | |
| | `docs/` | Blog post, video script, slide deck builder | |
| | `tests/` | Pytest suite (93 tests, anti-hack regressions included) | |
|
|
| ## Submission deliverables |
|
|
| Mapped to the four judging criteria: |
|
|
| | Criterion | Weight | Where it lives | |
| | --- | ---: | --- | |
| | Environment Innovation | 40% | `openenv.yaml`, `schema.py`, `verifier.py`, `env.py`, this README's *Architecture* and *Why the reward cannot be hacked* sections | |
| | Storytelling & Presentation | 30% | `/demo` Gradio UI + 90s video + HF blog | |
| | Showing Improvement in Rewards | 20% | `eval/results/*.png` (training curves + confusion + headline bar) embedded above | |
| | Reward & Training Pipeline | 10% | `rubric.py` + 93-test anti-hack suite + `train_grpo.ipynb` + `scripts/run_full_pipeline.sh` | |
|
|
| Submission checklist: |
|
|
| - [x] OpenEnv-compatible env (gym-style API, manifest, non-reserved tool names) |
| - [x] Deterministic RLVR verifier + plausibility checker |
| - [x] Layered defender + attacker reward |
| - [x] SFT warm-start dataset (committed) |
| - [x] Frozen 200-incident hold-out (committed) |
| - [x] GRPO curriculum notebook + one-shot training script |
| - [x] Eval harness + plotters |
| - [x] Pytest suite (93 tests, anti-hack regressions included) |
| - [x] Gradio `/demo` UI mounted on the same Space (free-CPU-tier compatible) |
| - [x] Blog post (`docs/blog.md`) |
| - [x] HF Space pushed and **running**: [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env) |
| - [x] SFT adapter trained and pushed: [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) |
| - [x] GRPO adapters trained and pushed (4 stages): [`stage1`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) [`stage2`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) [`stage3`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) [`stage4`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial) |
| - [x] Final adapter pushed: [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) |
|
|
| ## License |
|
|
| BSD-3-Clause. |
|
|