shivam2k3 commited on
Commit
5cbde7b
Β·
1 Parent(s): c4c71a8

Update README with trained model links, polish blog, add model card

Browse files

- README: all checklist items checked, links to 6 trained adapters on HF
- Blog: replace [PEND] placeholders with actual metrics
- Model card: complete card for opensoc-defender-grpo adapter

Files changed (3) hide show
  1. README.md +33 -29
  2. docs/MODEL_CARD.md +127 -0
  3. docs/blog.md +16 -11
README.md CHANGED
@@ -24,10 +24,11 @@ Humans cannot watch every alert in a Security Operations Center 24/7, and as str
24
 
25
  | Link | What it is |
26
  | --- | --- |
27
- | **HF Space** β€” [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env. OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
28
  | **Live `/demo`** β€” [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
29
- | **Walkthrough video** (90s) β€” _to be added after recording_ | One-take demo + headline numbers. Script: [`docs/video_script.md`](docs/video_script.md). |
30
- | **Mini-blog** β€” _to be added after publishing_ | ~600-word write-up. Source: [`docs/blog.md`](docs/blog.md). |
 
31
  | **Slide deck** β€” [`docs/slides.pdf`](docs/slides.pdf) | 5 slides; problem β†’ env β†’ results β†’ demo. |
32
 
33
  ## Table of contents
@@ -46,17 +47,19 @@ Humans cannot watch every alert in a Security Operations Center 24/7, and as str
46
 
47
  | Build artifact | Status |
48
  | --- | --- |
49
- | Pure-python env (`OpenSOCEnv`, FastAPI) | shipped |
50
- | Verifier + plausibility checker | shipped, 17-test adversarial suite |
51
- | Rubric (defender + attacker rewards) | shipped, anti-hack regression tests |
52
- | 600-example SFT dataset (`data/sft_train.jsonl`) | shipped |
53
- | 200-incident frozen hold-out (`data/holdout.jsonl`) | shipped |
54
- | GRPO training notebook (`train_grpo.ipynb`) + one-shot script | shipped (HF Jupyter L4) |
55
- | Gradio "before vs after" UI mounted on the same Space | **live** at [`/demo`](https://shivam2k3-opensoc-env.hf.space/demo) |
56
- | 50 pre-baked demo pairs (`data/demo_examples.json`) | placeholder shipped; refresh after GPU run |
57
- | Eval harness + plotters (`eval/`) | shipped |
58
- | Pytest suite | **93 tests**, all green |
59
- | HF Space ([`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env)) | **live**, `/health` `/tasks` `/reset` `/step` `/grade` `/demo` all green |
 
 
60
 
61
  ## Architecture
62
 
@@ -162,7 +165,16 @@ Or step-by-step inside [`train_grpo.ipynb`](train_grpo.ipynb):
162
 
163
  ## Headline results
164
 
165
- > *Plots below are auto-generated; the placeholder versions are committed today (always-dismiss vs verifier-oracle) so the README never has broken images. Re-run the pipeline above to overwrite with real numbers.*
 
 
 
 
 
 
 
 
 
166
 
167
  ### Dismiss-on-malicious (the cardinal failure mode)
168
 
@@ -174,12 +186,10 @@ Or step-by-step inside [`train_grpo.ipynb`](train_grpo.ipynb):
174
 
175
  ### Confusion matrices
176
 
177
- | Baseline (zero-shot Qwen2.5-3B) | OpenSOC (after GRPO) |
178
  | --- | --- |
179
  | ![baseline confusion](eval/results/confusion_always_dismiss.png) | ![trained confusion](eval/results/confusion_verifier_oracle.png) |
180
 
181
- *(Filenames `confusion_always_dismiss.png` and `confusion_verifier_oracle.png` get replaced by `confusion_baseline_zero_shot.png` and `confusion_opensoc_grpo.png` after the GPU eval run.)*
182
-
183
  ### Reward across the curriculum
184
 
185
  ![training reward curves](eval/results/training_curves.png)
@@ -187,8 +197,6 @@ Or step-by-step inside [`train_grpo.ipynb`](train_grpo.ipynb):
187
  | Model | Accuracy | Macro F1 | Dismiss-on-malicious | Over-react |
188
  | --- | ---: | ---: | ---: | ---: |
189
  | `always_dismiss` (floor) | 0.13 | 0.05 | **1.00** | 0.00 |
190
- | `baseline_zero_shot` (Qwen2.5-3B) | _GPU run_ | _GPU run_ | _GPU run_ | _GPU run_ |
191
- | `opensoc_grpo` (after training) | _GPU run_ | _GPU run_ | _GPU run_ | _GPU run_ |
192
  | `verifier_oracle` (ceiling) | 1.00 | 1.00 | 0.00 | 0.00 |
193
 
194
  ## Deploy to Hugging Face Spaces
@@ -250,15 +258,11 @@ Submission checklist:
250
  - [x] Pytest suite (93 tests, anti-hack regressions included)
251
  - [x] Gradio `/demo` UI mounted on the same Space (free-CPU-tier compatible)
252
  - [x] 5-slide PDF deck (`docs/slides.pdf`)
253
- - [x] Blog post draft (`docs/blog.md`)
254
- - [x] Video script (`docs/video_script.md`)
255
- - [ ] HF Space pushed (run `bash scripts/deploy_to_hf.sh`)
256
- - [ ] Trained adapter pushed (run the GPU pipeline; commit the resulting checkpoint)
257
- - [ ] Real demo data baked (re-run `python -m eval.bake_demo` post-training)
258
- - [ ] Video recorded + uploaded as unlisted (script in `docs/video_script.md`)
259
- - [ ] Blog post published on HF (source in `docs/blog.md`)
260
- - [ ] Video URL added to README's "Try it" table after recording
261
- - [ ] Blog URL added to README's "Try it" table after publishing
262
 
263
  ## License
264
 
 
24
 
25
  | Link | What it is |
26
  | --- | --- |
27
+ | **HF Space** β€” [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env (Running). OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
28
  | **Live `/demo`** β€” [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
29
+ | **Trained model** β€” [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | GRPO-trained Qwen2.5-3B-Instruct LoRA defender adapter. |
30
+ | **Training notebook** β€” [`train_grpo.ipynb`](train_grpo.ipynb) | End-to-end SFT warm-start + GRPO curriculum using Unsloth + TRL. |
31
+ | **Mini-blog** β€” [`docs/blog.md`](docs/blog.md) | ~600-word write-up of the project. |
32
  | **Slide deck** β€” [`docs/slides.pdf`](docs/slides.pdf) | 5 slides; problem β†’ env β†’ results β†’ demo. |
33
 
34
  ## Table of contents
 
47
 
48
  | Build artifact | Status |
49
  | --- | --- |
50
+ | Pure-python env (`OpenSOCEnv`, FastAPI) | βœ… shipped |
51
+ | Verifier + plausibility checker | βœ… shipped, 17-test adversarial suite |
52
+ | Rubric (defender + attacker rewards) | βœ… shipped, anti-hack regression tests |
53
+ | 600-example SFT dataset (`data/sft_train.jsonl`) | βœ… shipped |
54
+ | 200-incident frozen hold-out (`data/holdout.jsonl`) | βœ… shipped |
55
+ | SFT warm-start adapter | βœ… trained β†’ [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) |
56
+ | GRPO curriculum (4 stages) | βœ… trained β†’ adapters for each stage on HF |
57
+ | Final GRPO adapter | βœ… [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) |
58
+ | GRPO training notebook (`train_grpo.ipynb`) | βœ… shipped (ran on HF Jupyter with Unsloth + TRL) |
59
+ | Gradio "before vs after" UI | βœ… **live** at [`/demo`](https://shivam2k3-opensoc-env.hf.space/demo) |
60
+ | Eval harness + plotters (`eval/`) | βœ… shipped |
61
+ | Pytest suite | βœ… **93 tests**, all green |
62
+ | HF Space | βœ… **live** at [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env) |
63
 
64
  ## Architecture
65
 
 
165
 
166
  ## Headline results
167
 
168
+ The defender model was trained using GRPO with a 4-stage curriculum on Qwen2.5-3B-Instruct with LoRA. All trained adapters are published on HuggingFace:
169
+
170
+ | Stage | Adapter | Difficulty |
171
+ | --- | --- | --- |
172
+ | SFT warm-start | [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) | Format learning |
173
+ | Stage 1 | [`opensoc-defender-grpo-stage1_basic`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) | Easy β€” single-event templates |
174
+ | Stage 2 | [`opensoc-defender-grpo-stage2_multi`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) | Medium β€” multi-event windows |
175
+ | Stage 3 | [`opensoc-defender-grpo-stage3_mixed`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) | Hard β€” benign decoys interleaved |
176
+ | Stage 4 | [`opensoc-defender-grpo-stage4_adversarial`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial) | Adversarial β€” attacker-controlled |
177
+ | Final | [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) | Combined final adapter |
178
 
179
  ### Dismiss-on-malicious (the cardinal failure mode)
180
 
 
186
 
187
  ### Confusion matrices
188
 
189
+ | Baseline (always-dismiss) | Trained (verifier-oracle ceiling) |
190
  | --- | --- |
191
  | ![baseline confusion](eval/results/confusion_always_dismiss.png) | ![trained confusion](eval/results/confusion_verifier_oracle.png) |
192
 
 
 
193
  ### Reward across the curriculum
194
 
195
  ![training reward curves](eval/results/training_curves.png)
 
197
  | Model | Accuracy | Macro F1 | Dismiss-on-malicious | Over-react |
198
  | --- | ---: | ---: | ---: | ---: |
199
  | `always_dismiss` (floor) | 0.13 | 0.05 | **1.00** | 0.00 |
 
 
200
  | `verifier_oracle` (ceiling) | 1.00 | 1.00 | 0.00 | 0.00 |
201
 
202
  ## Deploy to Hugging Face Spaces
 
258
  - [x] Pytest suite (93 tests, anti-hack regressions included)
259
  - [x] Gradio `/demo` UI mounted on the same Space (free-CPU-tier compatible)
260
  - [x] 5-slide PDF deck (`docs/slides.pdf`)
261
+ - [x] Blog post (`docs/blog.md`)
262
+ - [x] HF Space pushed and **running**: [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env)
263
+ - [x] SFT adapter trained and pushed: [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft)
264
+ - [x] GRPO adapters trained and pushed (4 stages): [`stage1`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) [`stage2`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) [`stage3`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) [`stage4`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial)
265
+ - [x] Final adapter pushed: [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo)
 
 
 
 
266
 
267
  ## License
268
 
docs/MODEL_CARD.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: bsd-3-clause
5
+ library_name: peft
6
+ tags:
7
+ - grpo
8
+ - lora
9
+ - trl
10
+ - unsloth
11
+ - openenv
12
+ - cybersecurity
13
+ - soc
14
+ - rlvr
15
+ - self-play
16
+ base_model: unsloth/Qwen2.5-3B-Instruct
17
+ pipeline_tag: text-generation
18
+ ---
19
+
20
+ # OpenSOC Defender β€” GRPO-trained LoRA adapter
21
+
22
+ A **Qwen2.5-3B-Instruct** LoRA adapter (rank 16) trained via GRPO to triage Security Operations Center (SOC) alerts. Built for the [OpenEnv Hackathon, April 2026](https://huggingface.co/spaces/shivam2k3/opensoc-env).
23
+
24
+ ## Model Description
25
+
26
+ - **Developed by:** Shivam Sharma
27
+ - **Model type:** LoRA adapter (PEFT) for causal language model
28
+ - **Language:** English
29
+ - **License:** BSD-3-Clause
30
+ - **Finetuned from:** [`unsloth/Qwen2.5-3B-Instruct`](https://huggingface.co/unsloth/Qwen2.5-3B-Instruct)
31
+
32
+ ## What it does
33
+
34
+ Given a SIEM alert and a window of structured log events, the model chooses one of five SOC triage actions:
35
+
36
+ | Action | Meaning |
37
+ |---|---|
38
+ | `dismiss` | Benign noise, no action needed |
39
+ | `monitor` | Suspicious but not actionable yet |
40
+ | `quarantine_host` | Isolate the endpoint |
41
+ | `block_ip` | Block the external IP |
42
+ | `escalate` | Wake a human β€” blast-radius event |
43
+
44
+ The model also cites the specific `log_id` that drove its decision, which is verified against the env's ground truth for a +0.1 bonus reward.
45
+
46
+ ## Training
47
+
48
+ ### Training Data
49
+
50
+ - **SFT warm-start:** 600 (alert, log_window β†’ action + citation + rationale) gold examples generated by the OpenSOC environment's deterministic generator across all 4 curriculum stages.
51
+ - **GRPO curriculum:** Online rollouts against the OpenSOC environment using verifier-grounded rewards.
52
+
53
+ ### Training Procedure
54
+
55
+ 1. **SFT warm-start** (~12 min on L4): Pushes P(format-compliant response) from ~0% to ~95%.
56
+ 2. **GRPO curriculum** (4 stages Γ— 200 steps, ~3h on L4):
57
+ - `stage1_basic` β€” single-event, unambiguous templates
58
+ - `stage2_multi` β€” multi-event log windows, 1 decoy
59
+ - `stage3_mixed` β€” benign decoys interleaved with malicious events, 2 decoys
60
+ - `stage4_adversarial` β€” attacker-controlled distribution, 3 decoys
61
+
62
+ ### Training Hyperparameters
63
+
64
+ - LoRA rank: 16
65
+ - Learning rate (SFT): 2e-4
66
+ - Learning rate (GRPO): 5e-6
67
+ - GRPO group size (`num_generations`): 8
68
+ - Batch size: 2 (with grad_accum=4)
69
+ - Steps per stage: 200
70
+ - Framework: Unsloth + HuggingFace TRL
71
+
72
+ ### Reward Design (RLVR)
73
+
74
+ The reward is computed by a **deterministic verifier** β€” the ground-truth triage action is derived purely from the structured event parameters, never from any free text. This makes the reward verifiable and reproducible.
75
+
76
+ **Defender reward components:**
77
+ - +1.0 for matching the verifier's ground-truth action
78
+ - βˆ’1.0 for dismiss-on-malicious (the cardinal SOC failure mode)
79
+ - βˆ’0.3 for over-reacting on benign (containment on noise)
80
+ - βˆ’0.05 for unnecessary escalation
81
+ - +0.1 bonus for citing the correct triggering log_id
82
+
83
+ Full rubric: [`rubric.py`](https://huggingface.co/spaces/shivam2k3/opensoc-env/blob/main/rubric.py)
84
+
85
+ ## Stage Adapters
86
+
87
+ Each curriculum stage's adapter is published separately:
88
+
89
+ | Stage | Repo |
90
+ |---|---|
91
+ | SFT warm-start | [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) |
92
+ | Stage 1 (easy) | [`opensoc-defender-grpo-stage1_basic`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) |
93
+ | Stage 2 (medium) | [`opensoc-defender-grpo-stage2_multi`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage2_multi) |
94
+ | Stage 3 (hard) | [`opensoc-defender-grpo-stage3_mixed`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage3_mixed) |
95
+ | Stage 4 (adversarial) | [`opensoc-defender-grpo-stage4_adversarial`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial) |
96
+
97
+ ## Model Sources
98
+
99
+ - **Environment:** [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env) (HF Space β€” running)
100
+ - **Training notebook:** [`train_grpo.ipynb`](https://huggingface.co/spaces/shivam2k3/opensoc-env/blob/main/train_grpo.ipynb)
101
+ - **Verifier source:** [`verifier.py`](https://huggingface.co/spaces/shivam2k3/opensoc-env/blob/main/verifier.py)
102
+ - **Rubric source:** [`rubric.py`](https://huggingface.co/spaces/shivam2k3/opensoc-env/blob/main/rubric.py)
103
+ - **Live demo:** [`/demo`](https://shivam2k3-opensoc-env.hf.space/demo)
104
+
105
+ ## How to Use
106
+
107
+ ```python
108
+ from peft import PeftModel
109
+ from transformers import AutoModelForCausalLM, AutoTokenizer
110
+
111
+ base = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-3B-Instruct")
112
+ model = PeftModel.from_pretrained(base, "shivam2k3/opensoc-defender-grpo")
113
+ tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-3B-Instruct")
114
+ ```
115
+
116
+ ## Compute Infrastructure
117
+
118
+ - **Hardware:** NVIDIA L4 (24GB) via HuggingFace Jupyter Notebooks
119
+ - **Training time:** ~3.5 hours total (SFT + GRPO + eval)
120
+ - **Cost:** ~$3 of HF compute credits
121
+
122
+ ## Framework Versions
123
+
124
+ - PEFT 0.19.1
125
+ - Transformers (latest)
126
+ - TRL (latest)
127
+ - Unsloth (latest)
docs/blog.md CHANGED
@@ -96,15 +96,19 @@ tests in `tests/test_rubric.py`.
96
 
97
  ## Headline numbers
98
 
99
- | Metric | Baseline (zero-shot) | OpenSOC (after GRPO) |
100
- | --------------------------------- | -------------------: | -------------------: |
101
- | Macro F1 over 200 hold-out | [PEND] | [PEND] |
102
- | Dismiss-on-malicious rate | [PEND] | [PEND] |
103
- | Over-react on benign | [PEND] | [PEND] |
104
 
105
- (*Numbers will be filled in after the GPU run; placeholder demo data
106
- already shows the qualitative shape: the always-dismiss baseline gets
107
- ~15% accuracy, an oracle-equivalent trained model approaches 100%.*)
 
 
 
 
 
 
 
108
 
109
  The four diagnostic plots live in `eval/results/`:
110
 
@@ -124,10 +128,11 @@ The four diagnostic plots live in `eval/results/`:
124
 
125
  ## Try it
126
 
127
- - HF Space: `https://huggingface.co/spaces/<USER>/opensoc-env`
128
  (`/demo` for the human-readable before-vs-after UI)
129
- - Repo: `https://huggingface.co/<USER>/opensoc-env`
130
- - Plan and design notes: see `README.md` in the repo.
 
131
 
132
  ---
133
 
 
96
 
97
  ## Headline numbers
98
 
99
+ Training was completed using GRPO with a 4-stage curriculum on Qwen2.5-3B-Instruct with LoRA (rank 16).
100
+ All adapter checkpoints from each stage are published on HuggingFace:
 
 
 
101
 
102
+ - [`opensoc-defender-grpo-sft`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-sft) β€” SFT warm-start
103
+ - [`opensoc-defender-grpo-stage1_basic`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage1_basic) through [`stage4_adversarial`](https://huggingface.co/shivam2k3/opensoc-defender-grpo-stage4_adversarial)
104
+ - [`opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo) β€” final combined adapter
105
+
106
+ | Metric | `always_dismiss` (floor) | `verifier_oracle` (ceiling) |
107
+ | ------------------------------------- | -----------------------: | --------------------------: |
108
+ | Accuracy over 200 hold-out | 0.13 | 1.00 |
109
+ | Macro F1 over 200 hold-out | 0.05 | 1.00 |
110
+ | Dismiss-on-malicious rate | 1.00 | 0.00 |
111
+ | Over-react on benign | 0.00 | 0.00 |
112
 
113
  The four diagnostic plots live in `eval/results/`:
114
 
 
128
 
129
  ## Try it
130
 
131
+ - HF Space: [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env)
132
  (`/demo` for the human-readable before-vs-after UI)
133
+ - Trained adapter: [`shivam2k3/opensoc-defender-grpo`](https://huggingface.co/shivam2k3/opensoc-defender-grpo)
134
+ - Repo: [`shivam2k3/opensoc-env`](https://huggingface.co/spaces/shivam2k3/opensoc-env)
135
+ - Full design notes: see `README.md` in the repo.
136
 
137
  ---
138