Spaces:

Torchflow1
/

Multi-Agent-Incident-Command-Center

Sleeping

App Files Files Community

SwapnilPatil28 commited on Apr 26

Commit

8062d98

verified ·

1 Parent(s): 8cbdbde

Final Docs Update

Browse files

Files changed (2) hide show

docs/BLOG_POST.md +5 -5
docs/SUBMISSION_CHECKLIST.md +5 -5

docs/BLOG_POST.md CHANGED Viewed

@@ -126,7 +126,7 @@ The reward engine emits **named components** at every step so training curves
 I first wrote a deterministic `HeuristicCoordinator` that uses the observation's `investigation_targets` and role constraints to play through the environment. On hard tasks it earns **+5.89** reward where random scores **−12.50** — so that gives us ~680 `(prompt, completion)` pairs of "good" behavior to imitate.
-Training script: [`train_trl.py`](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/train_trl.py). One command on Colab T4 (or **[open the reproducible notebook ↗](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing)**) runs the entire pipeline:
 ```python
 os.environ["BASE_MODEL"]         = "Qwen/Qwen2.5-1.5B-Instruct"
@@ -231,13 +231,13 @@ I ran the exact same pipeline with the smaller **Qwen2.5-0.5B-Instruct** backbon
 | **Live environment** | [swapnilpatil28-multi-agent-incident-command-center.hf.space](https://swapnilpatil28-multi-agent-incident-command-center.hf.space) (OpenEnv-compatible, Docker-backed) |
 | **Training notebook** | [One-click Colab (T4, ~1 h 15 min end-to-end)](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing) |
 | **Source + tests** | [GitHub repo (21 passing tests, Dockerfile with HEALTHCHECK)](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center) |
-| **Full docs** | [README — Part 1 story + Part 2 technical deep-dive](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center#readme) |
-| **Committed evidence** | [`artifacts/`](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center/tree/main/artifacts) — all 4 PNGs + both JSON metric files |
-| **Submission checklist** | [`docs/SUBMISSION_CHECKLIST.md`](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/docs/SUBMISSION_CHECKLIST.md) |
 ---
-## 8. What's next
 - **Replace SFT with GRPO or PPO** using the environment's native reward signal — no heuristic teacher, let the rubric itself shape the policy and push past the imitation ceiling.
 - **Scale the incident catalog** from 13 templates to 50+ (drop in JSON-defined scenarios).

 I first wrote a deterministic `HeuristicCoordinator` that uses the observation's `investigation_targets` and role constraints to play through the environment. On hard tasks it earns **+5.89** reward where random scores **−12.50** — so that gives us ~680 `(prompt, completion)` pairs of "good" behavior to imitate.
+Training script: [`train_trl.py`](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/train_trl.py). One command on Colab T4 (or **[open the reproducible notebook ↗](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing)**) runs the entire pipeline:
 ```python
 os.environ["BASE_MODEL"]         = "Qwen/Qwen2.5-1.5B-Instruct"
 | **Live environment** | [swapnilpatil28-multi-agent-incident-command-center.hf.space](https://swapnilpatil28-multi-agent-incident-command-center.hf.space) (OpenEnv-compatible, Docker-backed) |
 | **Training notebook** | [One-click Colab (T4, ~1 h 15 min end-to-end)](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing) |
 | **Source + tests** | [GitHub repo (21 passing tests, Dockerfile with HEALTHCHECK)](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center) |
+| **Full docs** | [README — Part 1 story + Part 2 technical deep-dive](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/README.md) |
+| **Committed evidence** | [`artifacts/`](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/tree/main/artifacts) — all 4 PNGs + both JSON metric files |
+| **Submission checklist** | [`docs/SUBMISSION_CHECKLIST.md`](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/docs/SUBMISSION_CHECKLIST.md) |
 ---
+## 8. What's next (Planned)
 - **Replace SFT with GRPO or PPO** using the environment's native reward signal — no heuristic teacher, let the rubric itself shape the policy and push past the imitation ceiling.
 - **Scale the incident catalog** from 13 templates to 50+ (drop in JSON-defined scenarios).

docs/SUBMISSION_CHECKLIST.md CHANGED Viewed

@@ -87,9 +87,9 @@ Status against every hard gate in the official judging rules, plus every polish
 | 5 | Dashboard upgraded: hero story panel, 4 stacked plots, resources grid with README / blog / checklist links | ✅ |
 | 6 | Blog post updated (`docs/BLOG_POST.md`) with fixed image paths (raw GitHub URLs) and 0.5B ablation section | ✅ |
 | 7 | All 21 tests passing on latest commit | ✅ |
-| 8 | Run `openenv validate` remotely against the Space — `./validate-submission.sh <space-url>` | ⬜ (run it once before the deadline) |
-| 9 | **Submit the Space URL in the hackathon form:** `https://swapnilpatil28-multi-agent-incident-command-center.hf.space` | ⬜ |
-| 10 | Do not push commits after the submission deadline — post-deadline commits won't be considered | ⬜ |
 ---
@@ -100,13 +100,13 @@ Status against every hard gate in the official judging rules, plus every polish
 curl -fsS https://swapnilpatil28-multi-agent-incident-command-center.hf.space/healthz
 # 2. Env-info endpoint advertises metadata
-curl -s https://swapnilpatil28-multi-agent-incident-command-center.hf.space/env-info | head -20
 # 3. OpenEnv validator passes remotely
 ./validate-submission.sh https://swapnilpatil28-multi-agent-incident-command-center.hf.space
 # 4. A remote episode works
-ENV_URL=https://swapnilpatil28-multi-agent-incident-command-center.hf.space python inference.py | head -40
 ```
 ## Where the judges will find each artefact

 | 5 | Dashboard upgraded: hero story panel, 4 stacked plots, resources grid with README / blog / checklist links | ✅ |
 | 6 | Blog post updated (`docs/BLOG_POST.md`) with fixed image paths (raw GitHub URLs) and 0.5B ablation section | ✅ |
 | 7 | All 21 tests passing on latest commit | ✅ |
+| 8 | Run `openenv validate` remotely against the Space — `./validate-submission.sh <space-url>` | ✅ |
+| 9 | **Submit the Space URL in the hackathon form:** `https://swapnilpatil28-multi-agent-incident-command-center.hf.space` | ✅ |
+| 10 | Do not push commits after the submission deadline — post-deadline commits won't be considered | ✅ |
 ---
 curl -fsS https://swapnilpatil28-multi-agent-incident-command-center.hf.space/healthz
 # 2. Env-info endpoint advertises metadata
+curl -s https://swapnilpatil28-multi-agent-incident-command-center.hf.space/env-info
 # 3. OpenEnv validator passes remotely
 ./validate-submission.sh https://swapnilpatil28-multi-agent-incident-command-center.hf.space
 # 4. A remote episode works
+ENV_URL=https://swapnilpatil28-multi-agent-incident-command-center.hf.space python inference.py
 ```
 ## Where the judges will find each artefact