Fix README links and unignore placeholder eval plots

- Replace REPLACE_USER with shivam2k3 in README's "Try it" table.
- Drop video/blog URL placeholders (added when those deliverables land).
- Remove eval/results/ from .gitignore so the README's image references
resolve on the deployed Space; the placeholder plots are committed
today and will be overwritten by the post-training GPU run.

Made-with: Cursor

Files changed (10) hide show

.gitattributes +2 -0
.gitignore +4 -1
README.md +6 -7
eval/results/bar_dismiss_on_malicious.png +3 -0
eval/results/bar_macro_f1.png +3 -0
eval/results/confusion_always_dismiss.png +3 -0
eval/results/confusion_verifier_oracle.png +3 -0
eval/results/summary.json +154 -0
eval/results/training_curves.png +3 -0
eval/results/training_kl_loss.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.pdf filter=lfs diff=lfs merge=lfs -text

.gitignore CHANGED Viewed

@@ -17,4 +17,7 @@ wandb/
 *.bin
 *.safetensors
 .ipynb_checkpoints/
-eval/results/

 *.bin
 *.safetensors
 .ipynb_checkpoints/
+# eval/results/ is intentionally NOT ignored — the placeholder plots are
+# the README's images; the GPU run overwrites them with real numbers.
+.mplcache/

README.md CHANGED Viewed

@@ -8,14 +8,12 @@ Humans cannot watch every alert in a Security Operations Center 24/7, and as str
 | Link | What it is |
 | --- | --- |
-| **HF Space** — [`<USER>-opensoc-env.hf.space`](https://huggingface.co/spaces/REPLACE_USER/opensoc-env) | Deployed env. OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
-| **Live `/demo`** — [`<USER>-opensoc-env.hf.space/demo`](https://REPLACE_USER-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
-| **Walkthrough video** (90s) — [`youtu.be/<UNLISTED>`](https://youtu.be/REPLACE_VIDEO) | One-take demo + headline numbers. Script: [`docs/video_script.md`](docs/video_script.md). |
-| **Mini-blog** — [`huggingface.co/blog/<USER>/opensoc-rlvr-soc-triage`](https://huggingface.co/blog/REPLACE_USER/opensoc-rlvr-soc-triage) | ~600-word write-up. Source: [`docs/blog.md`](docs/blog.md). |
 | **Slide deck** — [`docs/slides.pdf`](docs/slides.pdf) | 5 slides; problem → env → results → demo. |
-> *Replace the four `REPLACE_*` placeholders above after deploy + recording. The slide PDF auto-rebuilds from `docs/build_slides.py`.*
 ## Table of contents
 1. [Architecture](#architecture)
@@ -242,7 +240,8 @@ Submission checklist:
 - [ ] Real demo data baked (re-run `python -m eval.bake_demo` post-training)
 - [ ] Video recorded + uploaded as unlisted (script in `docs/video_script.md`)
 - [ ] Blog post published on HF (source in `docs/blog.md`)
-- [ ] All four `REPLACE_*` placeholders at the top filled in
 ## License

 | Link | What it is |
 | --- | --- |
+| **HF Space** — [`shivam2k3-opensoc-env.hf.space`](https://huggingface.co/spaces/shivam2k3/opensoc-env) | Deployed env. OpenEnv judge can hit `/reset` `/step` `/state` `/grade`. |
+| **Live `/demo`** — [`shivam2k3-opensoc-env.hf.space/demo`](https://shivam2k3-opensoc-env.hf.space/demo) | Gradio "before vs after" UI. Click **Next incident** to compare baseline vs trained. |
+| **Walkthrough video** (90s) — _to be added after recording_ | One-take demo + headline numbers. Script: [`docs/video_script.md`](docs/video_script.md). |
+| **Mini-blog** — _to be added after publishing_ | ~600-word write-up. Source: [`docs/blog.md`](docs/blog.md). |
 | **Slide deck** — [`docs/slides.pdf`](docs/slides.pdf) | 5 slides; problem → env → results → demo. |
 ## Table of contents
 1. [Architecture](#architecture)
 - [ ] Real demo data baked (re-run `python -m eval.bake_demo` post-training)
 - [ ] Video recorded + uploaded as unlisted (script in `docs/video_script.md`)
 - [ ] Blog post published on HF (source in `docs/blog.md`)
+- [ ] Video URL added to README's "Try it" table after recording
+- [ ] Blog URL added to README's "Try it" table after publishing
 ## License

eval/results/bar_dismiss_on_malicious.png ADDED Viewed

Git LFS Details

SHA256: d24f9f4ca412e93607c09501f0ff93814eb4c85fcdf975966f543b2cdfb9c8fc
Pointer size: 130 Bytes
Size of remote file: 34.8 kB

eval/results/bar_macro_f1.png ADDED Viewed

Git LFS Details

SHA256: cd49ac6e9d5ad18e5e82b1c285d4b26bcfe2b56eba6fe4603818551280b55f66
Pointer size: 130 Bytes
Size of remote file: 29.5 kB

eval/results/confusion_always_dismiss.png ADDED Viewed

Git LFS Details

SHA256: 4efc2d32990f02dc822dfdb18294f2cdbe1c4a437a434a85a019b8c442a51e7b
Pointer size: 130 Bytes
Size of remote file: 48 kB

eval/results/confusion_verifier_oracle.png ADDED Viewed

Git LFS Details

SHA256: fc72e0690676d1bf8ae6b0986be12b47bdd269e1475245d56c445462d8d4738a
Pointer size: 130 Bytes
Size of remote file: 47.9 kB

eval/results/summary.json ADDED Viewed

	@@ -0,0 +1,154 @@

+[
+  {
+    "label": "always_dismiss",
+    "accuracy": 0.13,
+    "macro_f1": 0.046017699115044254,
+    "dismiss_on_malicious": 1.0,
+    "over_react_rate": 0.0,
+    "per_class": {
+      "dismiss": {
+        "precision": 0.13,
+        "recall": 1.0,
+        "f1": 0.23008849557522126,
+        "support": 26
+      },
+      "monitor": {
+        "precision": 0.0,
+        "recall": 0.0,
+        "f1": 0.0,
+        "support": 72
+      },
+      "quarantine_host": {
+        "precision": 0.0,
+        "recall": 0.0,
+        "f1": 0.0,
+        "support": 46
+      },
+      "block_ip": {
+        "precision": 0.0,
+        "recall": 0.0,
+        "f1": 0.0,
+        "support": 40
+      },
+      "escalate": {
+        "precision": 0.0,
+        "recall": 0.0,
+        "f1": 0.0,
+        "support": 16
+      }
+    },
+    "confusion_matrix": {
+      "dismiss": {
+        "dismiss": 26,
+        "monitor": 0,
+        "quarantine_host": 0,
+        "block_ip": 0,
+        "escalate": 0
+      },
+      "monitor": {
+        "dismiss": 72,
+        "monitor": 0,
+        "quarantine_host": 0,
+        "block_ip": 0,
+        "escalate": 0
+      },
+      "quarantine_host": {
+        "dismiss": 46,
+        "monitor": 0,
+        "quarantine_host": 0,
+        "block_ip": 0,
+        "escalate": 0
+      },
+      "block_ip": {
+        "dismiss": 40,
+        "monitor": 0,
+        "quarantine_host": 0,
+        "block_ip": 0,
+        "escalate": 0
+      },
+      "escalate": {
+        "dismiss": 16,
+        "monitor": 0,
+        "quarantine_host": 0,
+        "block_ip": 0,
+        "escalate": 0
+      }
+    }
+  },
+  {
+    "label": "verifier_oracle",
+    "accuracy": 1.0,
+    "macro_f1": 1.0,
+    "dismiss_on_malicious": 0.0,
+    "over_react_rate": 0.0,
+    "per_class": {
+      "dismiss": {
+        "precision": 1.0,
+        "recall": 1.0,
+        "f1": 1.0,
+        "support": 26
+      },
+      "monitor": {
+        "precision": 1.0,
+        "recall": 1.0,
+        "f1": 1.0,
+        "support": 72
+      },
+      "quarantine_host": {
+        "precision": 1.0,
+        "recall": 1.0,
+        "f1": 1.0,
+        "support": 46
+      },
+      "block_ip": {
+        "precision": 1.0,
+        "recall": 1.0,
+        "f1": 1.0,
+        "support": 40
+      },
+      "escalate": {
+        "precision": 1.0,
+        "recall": 1.0,
+        "f1": 1.0,
+        "support": 16
+      }
+    },
+    "confusion_matrix": {
+      "dismiss": {
+        "dismiss": 26,
+        "monitor": 0,
+        "quarantine_host": 0,
+        "block_ip": 0,
+        "escalate": 0
+      },
+      "monitor": {
+        "dismiss": 0,
+        "monitor": 72,
+        "quarantine_host": 0,
+        "block_ip": 0,
+        "escalate": 0
+      },
+      "quarantine_host": {
+        "dismiss": 0,
+        "monitor": 0,
+        "quarantine_host": 46,
+        "block_ip": 0,
+        "escalate": 0
+      },
+      "block_ip": {
+        "dismiss": 0,
+        "monitor": 0,
+        "quarantine_host": 0,
+        "block_ip": 40,
+        "escalate": 0
+      },
+      "escalate": {
+        "dismiss": 0,
+        "monitor": 0,
+        "quarantine_host": 0,
+        "block_ip": 0,
+        "escalate": 16
+      }
+    }
+  }
+]

eval/results/training_curves.png ADDED Viewed

Git LFS Details

SHA256: abdbd7a31ebf9dba80ece705ec494514704aa5fdb51eb026e38b50b5492ce5e2
Pointer size: 130 Bytes
Size of remote file: 92.9 kB

eval/results/training_kl_loss.png ADDED Viewed

Git LFS Details

SHA256: b9c14cc744b3ffab04e499ca45162f6464cd2fad3d94bb623e157c9d628289e0
Pointer size: 131 Bytes
Size of remote file: 176 kB