shivam2k3 commited on
Commit
64649c4
·
1 Parent(s): da392f9

Fix README links and unignore placeholder eval plots

Browse files

- Replace REPLACE_USER with shivam2k3 in README's "Try it" table.
- Drop video/blog URL placeholders (added when those deliverables land).
- Remove eval/results/ from .gitignore so the README's image references
resolve on the deployed Space; the placeholder plots are committed
today and will be overwritten by the post-training GPU run.

Made-with: Cursor

.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
37
+ *.pdf filter=lfs diff=lfs merge=lfs -text
.gitignore CHANGED
@@ -17,4 +17,7 @@ wandb/
17
  *.bin
18
  *.safetensors
19
  .ipynb_checkpoints/
20
- eval/results/
 
 
 
 
17
  *.bin
18
  *.safetensors
19
  .ipynb_checkpoints/
20
+
21
+ # eval/results/ is intentionally NOT ignored — the placeholder plots are
22
+ # the README's images; the GPU run overwrites them with real numbers.
23
+ .mplcache/
README.md CHANGED
@@ -8,14 +8,12 @@ Humans cannot watch every alert in a Security Operations Center 24/7, and as str
8
 
9
  | Link | What it is |
10
  | --- | --- |
11
- | **HF Space** — [`<USER>-opensoc-env.hf.space`](https://huggingface.co/spaces/REPLACE_USER/opensoc-env) | Deployed env. OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
12
- | **Live `/demo`** — [`<USER>-opensoc-env.hf.space/demo`](https://REPLACE_USER-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
13
- | **Walkthrough video** (90s) — [`youtu.be/<UNLISTED>`](https://youtu.be/REPLACE_VIDEO) | One-take demo + headline numbers. Script: [`docs/video_script.md`](docs/video_script.md). |
14
- | **Mini-blog** — [`huggingface.co/blog/<USER>/opensoc-rlvr-soc-triage`](https://huggingface.co/blog/REPLACE_USER/opensoc-rlvr-soc-triage) | ~600-word write-up. Source: [`docs/blog.md`](docs/blog.md). |
15
  | **Slide deck** — [`docs/slides.pdf`](docs/slides.pdf) | 5 slides; problem → env → results → demo. |
16
 
17
- > *Replace the four `REPLACE_*` placeholders above after deploy + recording. The slide PDF auto-rebuilds from `docs/build_slides.py`.*
18
-
19
  ## Table of contents
20
 
21
  1. [Architecture](#architecture)
@@ -242,7 +240,8 @@ Submission checklist:
242
  - [ ] Real demo data baked (re-run `python -m eval.bake_demo` post-training)
243
  - [ ] Video recorded + uploaded as unlisted (script in `docs/video_script.md`)
244
  - [ ] Blog post published on HF (source in `docs/blog.md`)
245
- - [ ] All four `REPLACE_*` placeholders at the top filled in
 
246
 
247
  ## License
248
 
 
8
 
9
  | Link | What it is |
10
  | --- | --- |
11
+ | **HF Space** — [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env. OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
12
+ | **Live `/demo`** — [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
13
+ | **Walkthrough video** (90s) — _to be added after recording_ | One-take demo + headline numbers. Script: [`docs/video_script.md`](docs/video_script.md). |
14
+ | **Mini-blog** — _to be added after publishing_ | ~600-word write-up. Source: [`docs/blog.md`](docs/blog.md). |
15
  | **Slide deck** — [`docs/slides.pdf`](docs/slides.pdf) | 5 slides; problem → env → results → demo. |
16
 
 
 
17
  ## Table of contents
18
 
19
  1. [Architecture](#architecture)
 
240
  - [ ] Real demo data baked (re-run `python -m eval.bake_demo` post-training)
241
  - [ ] Video recorded + uploaded as unlisted (script in `docs/video_script.md`)
242
  - [ ] Blog post published on HF (source in `docs/blog.md`)
243
+ - [ ] Video URL added to README's "Try it" table after recording
244
+ - [ ] Blog URL added to README's "Try it" table after publishing
245
 
246
  ## License
247
 
eval/results/bar_dismiss_on_malicious.png ADDED

Git LFS Details

  • SHA256: d24f9f4ca412e93607c09501f0ff93814eb4c85fcdf975966f543b2cdfb9c8fc
  • Pointer size: 130 Bytes
  • Size of remote file: 34.8 kB
eval/results/bar_macro_f1.png ADDED

Git LFS Details

  • SHA256: cd49ac6e9d5ad18e5e82b1c285d4b26bcfe2b56eba6fe4603818551280b55f66
  • Pointer size: 130 Bytes
  • Size of remote file: 29.5 kB
eval/results/confusion_always_dismiss.png ADDED

Git LFS Details

  • SHA256: 4efc2d32990f02dc822dfdb18294f2cdbe1c4a437a434a85a019b8c442a51e7b
  • Pointer size: 130 Bytes
  • Size of remote file: 48 kB
eval/results/confusion_verifier_oracle.png ADDED

Git LFS Details

  • SHA256: fc72e0690676d1bf8ae6b0986be12b47bdd269e1475245d56c445462d8d4738a
  • Pointer size: 130 Bytes
  • Size of remote file: 47.9 kB
eval/results/summary.json ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "label": "always_dismiss",
4
+ "accuracy": 0.13,
5
+ "macro_f1": 0.046017699115044254,
6
+ "dismiss_on_malicious": 1.0,
7
+ "over_react_rate": 0.0,
8
+ "per_class": {
9
+ "dismiss": {
10
+ "precision": 0.13,
11
+ "recall": 1.0,
12
+ "f1": 0.23008849557522126,
13
+ "support": 26
14
+ },
15
+ "monitor": {
16
+ "precision": 0.0,
17
+ "recall": 0.0,
18
+ "f1": 0.0,
19
+ "support": 72
20
+ },
21
+ "quarantine_host": {
22
+ "precision": 0.0,
23
+ "recall": 0.0,
24
+ "f1": 0.0,
25
+ "support": 46
26
+ },
27
+ "block_ip": {
28
+ "precision": 0.0,
29
+ "recall": 0.0,
30
+ "f1": 0.0,
31
+ "support": 40
32
+ },
33
+ "escalate": {
34
+ "precision": 0.0,
35
+ "recall": 0.0,
36
+ "f1": 0.0,
37
+ "support": 16
38
+ }
39
+ },
40
+ "confusion_matrix": {
41
+ "dismiss": {
42
+ "dismiss": 26,
43
+ "monitor": 0,
44
+ "quarantine_host": 0,
45
+ "block_ip": 0,
46
+ "escalate": 0
47
+ },
48
+ "monitor": {
49
+ "dismiss": 72,
50
+ "monitor": 0,
51
+ "quarantine_host": 0,
52
+ "block_ip": 0,
53
+ "escalate": 0
54
+ },
55
+ "quarantine_host": {
56
+ "dismiss": 46,
57
+ "monitor": 0,
58
+ "quarantine_host": 0,
59
+ "block_ip": 0,
60
+ "escalate": 0
61
+ },
62
+ "block_ip": {
63
+ "dismiss": 40,
64
+ "monitor": 0,
65
+ "quarantine_host": 0,
66
+ "block_ip": 0,
67
+ "escalate": 0
68
+ },
69
+ "escalate": {
70
+ "dismiss": 16,
71
+ "monitor": 0,
72
+ "quarantine_host": 0,
73
+ "block_ip": 0,
74
+ "escalate": 0
75
+ }
76
+ }
77
+ },
78
+ {
79
+ "label": "verifier_oracle",
80
+ "accuracy": 1.0,
81
+ "macro_f1": 1.0,
82
+ "dismiss_on_malicious": 0.0,
83
+ "over_react_rate": 0.0,
84
+ "per_class": {
85
+ "dismiss": {
86
+ "precision": 1.0,
87
+ "recall": 1.0,
88
+ "f1": 1.0,
89
+ "support": 26
90
+ },
91
+ "monitor": {
92
+ "precision": 1.0,
93
+ "recall": 1.0,
94
+ "f1": 1.0,
95
+ "support": 72
96
+ },
97
+ "quarantine_host": {
98
+ "precision": 1.0,
99
+ "recall": 1.0,
100
+ "f1": 1.0,
101
+ "support": 46
102
+ },
103
+ "block_ip": {
104
+ "precision": 1.0,
105
+ "recall": 1.0,
106
+ "f1": 1.0,
107
+ "support": 40
108
+ },
109
+ "escalate": {
110
+ "precision": 1.0,
111
+ "recall": 1.0,
112
+ "f1": 1.0,
113
+ "support": 16
114
+ }
115
+ },
116
+ "confusion_matrix": {
117
+ "dismiss": {
118
+ "dismiss": 26,
119
+ "monitor": 0,
120
+ "quarantine_host": 0,
121
+ "block_ip": 0,
122
+ "escalate": 0
123
+ },
124
+ "monitor": {
125
+ "dismiss": 0,
126
+ "monitor": 72,
127
+ "quarantine_host": 0,
128
+ "block_ip": 0,
129
+ "escalate": 0
130
+ },
131
+ "quarantine_host": {
132
+ "dismiss": 0,
133
+ "monitor": 0,
134
+ "quarantine_host": 46,
135
+ "block_ip": 0,
136
+ "escalate": 0
137
+ },
138
+ "block_ip": {
139
+ "dismiss": 0,
140
+ "monitor": 0,
141
+ "quarantine_host": 0,
142
+ "block_ip": 40,
143
+ "escalate": 0
144
+ },
145
+ "escalate": {
146
+ "dismiss": 0,
147
+ "monitor": 0,
148
+ "quarantine_host": 0,
149
+ "block_ip": 0,
150
+ "escalate": 16
151
+ }
152
+ }
153
+ }
154
+ ]
eval/results/training_curves.png ADDED

Git LFS Details

  • SHA256: abdbd7a31ebf9dba80ece705ec494514704aa5fdb51eb026e38b50b5492ce5e2
  • Pointer size: 130 Bytes
  • Size of remote file: 92.9 kB
eval/results/training_kl_loss.png ADDED

Git LFS Details

  • SHA256: b9c14cc744b3ffab04e499ca45162f6464cd2fad3d94bb623e157c9d628289e0
  • Pointer size: 131 Bytes
  • Size of remote file: 176 kB