Fix README links and unignore placeholder eval plots
Browse files- Replace REPLACE_USER with shivam2k3 in README's "Try it" table.
- Drop video/blog URL placeholders (added when those deliverables land).
- Remove eval/results/ from .gitignore so the README's image references
resolve on the deployed Space; the placeholder plots are committed
today and will be overwritten by the post-training GPU run.
Made-with: Cursor
- .gitattributes +2 -0
- .gitignore +4 -1
- README.md +6 -7
- eval/results/bar_dismiss_on_malicious.png +3 -0
- eval/results/bar_macro_f1.png +3 -0
- eval/results/confusion_always_dismiss.png +3 -0
- eval/results/confusion_verifier_oracle.png +3 -0
- eval/results/summary.json +154 -0
- eval/results/training_curves.png +3 -0
- eval/results/training_kl_loss.png +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
*.pdf filter=lfs diff=lfs merge=lfs -text
|
.gitignore
CHANGED
|
@@ -17,4 +17,7 @@ wandb/
|
|
| 17 |
*.bin
|
| 18 |
*.safetensors
|
| 19 |
.ipynb_checkpoints/
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
*.bin
|
| 18 |
*.safetensors
|
| 19 |
.ipynb_checkpoints/
|
| 20 |
+
|
| 21 |
+
# eval/results/ is intentionally NOT ignored — the placeholder plots are
|
| 22 |
+
# the README's images; the GPU run overwrites them with real numbers.
|
| 23 |
+
.mplcache/
|
README.md
CHANGED
|
@@ -8,14 +8,12 @@ Humans cannot watch every alert in a Security Operations Center 24/7, and as str
|
|
| 8 |
|
| 9 |
| Link | What it is |
|
| 10 |
| --- | --- |
|
| 11 |
-
| **HF Space** — [`
|
| 12 |
-
| **Live `/demo`** — [`
|
| 13 |
-
| **Walkthrough video** (90s) —
|
| 14 |
-
| **Mini-blog** —
|
| 15 |
| **Slide deck** — [`docs/slides.pdf`](docs/slides.pdf) | 5 slides; problem → env → results → demo. |
|
| 16 |
|
| 17 |
-
> *Replace the four `REPLACE_*` placeholders above after deploy + recording. The slide PDF auto-rebuilds from `docs/build_slides.py`.*
|
| 18 |
-
|
| 19 |
## Table of contents
|
| 20 |
|
| 21 |
1. [Architecture](#architecture)
|
|
@@ -242,7 +240,8 @@ Submission checklist:
|
|
| 242 |
- [ ] Real demo data baked (re-run `python -m eval.bake_demo` post-training)
|
| 243 |
- [ ] Video recorded + uploaded as unlisted (script in `docs/video_script.md`)
|
| 244 |
- [ ] Blog post published on HF (source in `docs/blog.md`)
|
| 245 |
-
- [ ]
|
|
|
|
| 246 |
|
| 247 |
## License
|
| 248 |
|
|
|
|
| 8 |
|
| 9 |
| Link | What it is |
|
| 10 |
| --- | --- |
|
| 11 |
+
| **HF Space** — [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env. OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
|
| 12 |
+
| **Live `/demo`** — [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
|
| 13 |
+
| **Walkthrough video** (90s) — _to be added after recording_ | One-take demo + headline numbers. Script: [`docs/video_script.md`](docs/video_script.md). |
|
| 14 |
+
| **Mini-blog** — _to be added after publishing_ | ~600-word write-up. Source: [`docs/blog.md`](docs/blog.md). |
|
| 15 |
| **Slide deck** — [`docs/slides.pdf`](docs/slides.pdf) | 5 slides; problem → env → results → demo. |
|
| 16 |
|
|
|
|
|
|
|
| 17 |
## Table of contents
|
| 18 |
|
| 19 |
1. [Architecture](#architecture)
|
|
|
|
| 240 |
- [ ] Real demo data baked (re-run `python -m eval.bake_demo` post-training)
|
| 241 |
- [ ] Video recorded + uploaded as unlisted (script in `docs/video_script.md`)
|
| 242 |
- [ ] Blog post published on HF (source in `docs/blog.md`)
|
| 243 |
+
- [ ] Video URL added to README's "Try it" table after recording
|
| 244 |
+
- [ ] Blog URL added to README's "Try it" table after publishing
|
| 245 |
|
| 246 |
## License
|
| 247 |
|
eval/results/bar_dismiss_on_malicious.png
ADDED
|
Git LFS Details
|
eval/results/bar_macro_f1.png
ADDED
|
Git LFS Details
|
eval/results/confusion_always_dismiss.png
ADDED
|
Git LFS Details
|
eval/results/confusion_verifier_oracle.png
ADDED
|
Git LFS Details
|
eval/results/summary.json
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"label": "always_dismiss",
|
| 4 |
+
"accuracy": 0.13,
|
| 5 |
+
"macro_f1": 0.046017699115044254,
|
| 6 |
+
"dismiss_on_malicious": 1.0,
|
| 7 |
+
"over_react_rate": 0.0,
|
| 8 |
+
"per_class": {
|
| 9 |
+
"dismiss": {
|
| 10 |
+
"precision": 0.13,
|
| 11 |
+
"recall": 1.0,
|
| 12 |
+
"f1": 0.23008849557522126,
|
| 13 |
+
"support": 26
|
| 14 |
+
},
|
| 15 |
+
"monitor": {
|
| 16 |
+
"precision": 0.0,
|
| 17 |
+
"recall": 0.0,
|
| 18 |
+
"f1": 0.0,
|
| 19 |
+
"support": 72
|
| 20 |
+
},
|
| 21 |
+
"quarantine_host": {
|
| 22 |
+
"precision": 0.0,
|
| 23 |
+
"recall": 0.0,
|
| 24 |
+
"f1": 0.0,
|
| 25 |
+
"support": 46
|
| 26 |
+
},
|
| 27 |
+
"block_ip": {
|
| 28 |
+
"precision": 0.0,
|
| 29 |
+
"recall": 0.0,
|
| 30 |
+
"f1": 0.0,
|
| 31 |
+
"support": 40
|
| 32 |
+
},
|
| 33 |
+
"escalate": {
|
| 34 |
+
"precision": 0.0,
|
| 35 |
+
"recall": 0.0,
|
| 36 |
+
"f1": 0.0,
|
| 37 |
+
"support": 16
|
| 38 |
+
}
|
| 39 |
+
},
|
| 40 |
+
"confusion_matrix": {
|
| 41 |
+
"dismiss": {
|
| 42 |
+
"dismiss": 26,
|
| 43 |
+
"monitor": 0,
|
| 44 |
+
"quarantine_host": 0,
|
| 45 |
+
"block_ip": 0,
|
| 46 |
+
"escalate": 0
|
| 47 |
+
},
|
| 48 |
+
"monitor": {
|
| 49 |
+
"dismiss": 72,
|
| 50 |
+
"monitor": 0,
|
| 51 |
+
"quarantine_host": 0,
|
| 52 |
+
"block_ip": 0,
|
| 53 |
+
"escalate": 0
|
| 54 |
+
},
|
| 55 |
+
"quarantine_host": {
|
| 56 |
+
"dismiss": 46,
|
| 57 |
+
"monitor": 0,
|
| 58 |
+
"quarantine_host": 0,
|
| 59 |
+
"block_ip": 0,
|
| 60 |
+
"escalate": 0
|
| 61 |
+
},
|
| 62 |
+
"block_ip": {
|
| 63 |
+
"dismiss": 40,
|
| 64 |
+
"monitor": 0,
|
| 65 |
+
"quarantine_host": 0,
|
| 66 |
+
"block_ip": 0,
|
| 67 |
+
"escalate": 0
|
| 68 |
+
},
|
| 69 |
+
"escalate": {
|
| 70 |
+
"dismiss": 16,
|
| 71 |
+
"monitor": 0,
|
| 72 |
+
"quarantine_host": 0,
|
| 73 |
+
"block_ip": 0,
|
| 74 |
+
"escalate": 0
|
| 75 |
+
}
|
| 76 |
+
}
|
| 77 |
+
},
|
| 78 |
+
{
|
| 79 |
+
"label": "verifier_oracle",
|
| 80 |
+
"accuracy": 1.0,
|
| 81 |
+
"macro_f1": 1.0,
|
| 82 |
+
"dismiss_on_malicious": 0.0,
|
| 83 |
+
"over_react_rate": 0.0,
|
| 84 |
+
"per_class": {
|
| 85 |
+
"dismiss": {
|
| 86 |
+
"precision": 1.0,
|
| 87 |
+
"recall": 1.0,
|
| 88 |
+
"f1": 1.0,
|
| 89 |
+
"support": 26
|
| 90 |
+
},
|
| 91 |
+
"monitor": {
|
| 92 |
+
"precision": 1.0,
|
| 93 |
+
"recall": 1.0,
|
| 94 |
+
"f1": 1.0,
|
| 95 |
+
"support": 72
|
| 96 |
+
},
|
| 97 |
+
"quarantine_host": {
|
| 98 |
+
"precision": 1.0,
|
| 99 |
+
"recall": 1.0,
|
| 100 |
+
"f1": 1.0,
|
| 101 |
+
"support": 46
|
| 102 |
+
},
|
| 103 |
+
"block_ip": {
|
| 104 |
+
"precision": 1.0,
|
| 105 |
+
"recall": 1.0,
|
| 106 |
+
"f1": 1.0,
|
| 107 |
+
"support": 40
|
| 108 |
+
},
|
| 109 |
+
"escalate": {
|
| 110 |
+
"precision": 1.0,
|
| 111 |
+
"recall": 1.0,
|
| 112 |
+
"f1": 1.0,
|
| 113 |
+
"support": 16
|
| 114 |
+
}
|
| 115 |
+
},
|
| 116 |
+
"confusion_matrix": {
|
| 117 |
+
"dismiss": {
|
| 118 |
+
"dismiss": 26,
|
| 119 |
+
"monitor": 0,
|
| 120 |
+
"quarantine_host": 0,
|
| 121 |
+
"block_ip": 0,
|
| 122 |
+
"escalate": 0
|
| 123 |
+
},
|
| 124 |
+
"monitor": {
|
| 125 |
+
"dismiss": 0,
|
| 126 |
+
"monitor": 72,
|
| 127 |
+
"quarantine_host": 0,
|
| 128 |
+
"block_ip": 0,
|
| 129 |
+
"escalate": 0
|
| 130 |
+
},
|
| 131 |
+
"quarantine_host": {
|
| 132 |
+
"dismiss": 0,
|
| 133 |
+
"monitor": 0,
|
| 134 |
+
"quarantine_host": 46,
|
| 135 |
+
"block_ip": 0,
|
| 136 |
+
"escalate": 0
|
| 137 |
+
},
|
| 138 |
+
"block_ip": {
|
| 139 |
+
"dismiss": 0,
|
| 140 |
+
"monitor": 0,
|
| 141 |
+
"quarantine_host": 0,
|
| 142 |
+
"block_ip": 40,
|
| 143 |
+
"escalate": 0
|
| 144 |
+
},
|
| 145 |
+
"escalate": {
|
| 146 |
+
"dismiss": 0,
|
| 147 |
+
"monitor": 0,
|
| 148 |
+
"quarantine_host": 0,
|
| 149 |
+
"block_ip": 0,
|
| 150 |
+
"escalate": 16
|
| 151 |
+
}
|
| 152 |
+
}
|
| 153 |
+
}
|
| 154 |
+
]
|
eval/results/training_curves.png
ADDED
|
Git LFS Details
|
eval/results/training_kl_loss.png
ADDED
|
Git LFS Details
|