SwapnilPatil28 commited on
Commit
8062d98
Β·
verified Β·
1 Parent(s): 8cbdbde

Final Docs Update

Browse files
docs/BLOG_POST.md CHANGED
@@ -126,7 +126,7 @@ The reward engine emits **named components** at every step so training curves
126
 
127
  I first wrote a deterministic `HeuristicCoordinator` that uses the observation's `investigation_targets` and role constraints to play through the environment. On hard tasks it earns **+5.89** reward where random scores **βˆ’12.50** β€” so that gives us ~680 `(prompt, completion)` pairs of "good" behavior to imitate.
128
 
129
- Training script: [`train_trl.py`](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/train_trl.py). One command on Colab T4 (or **[open the reproducible notebook β†—](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing)**) runs the entire pipeline:
130
 
131
  ```python
132
  os.environ["BASE_MODEL"] = "Qwen/Qwen2.5-1.5B-Instruct"
@@ -231,13 +231,13 @@ I ran the exact same pipeline with the smaller **Qwen2.5-0.5B-Instruct** backbon
231
  | **Live environment** | [swapnilpatil28-multi-agent-incident-command-center.hf.space](https://swapnilpatil28-multi-agent-incident-command-center.hf.space) (OpenEnv-compatible, Docker-backed) |
232
  | **Training notebook** | [One-click Colab (T4, ~1 h 15 min end-to-end)](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing) |
233
  | **Source + tests** | [GitHub repo (21 passing tests, Dockerfile with HEALTHCHECK)](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center) |
234
- | **Full docs** | [README β€” Part 1 story + Part 2 technical deep-dive](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center#readme) |
235
- | **Committed evidence** | [`artifacts/`](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center/tree/main/artifacts) β€” all 4 PNGs + both JSON metric files |
236
- | **Submission checklist** | [`docs/SUBMISSION_CHECKLIST.md`](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/docs/SUBMISSION_CHECKLIST.md) |
237
 
238
  ---
239
 
240
- ## 8. What's next
241
 
242
  - **Replace SFT with GRPO or PPO** using the environment's native reward signal β€” no heuristic teacher, let the rubric itself shape the policy and push past the imitation ceiling.
243
  - **Scale the incident catalog** from 13 templates to 50+ (drop in JSON-defined scenarios).
 
126
 
127
  I first wrote a deterministic `HeuristicCoordinator` that uses the observation's `investigation_targets` and role constraints to play through the environment. On hard tasks it earns **+5.89** reward where random scores **βˆ’12.50** β€” so that gives us ~680 `(prompt, completion)` pairs of "good" behavior to imitate.
128
 
129
+ Training script: [`train_trl.py`](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/train_trl.py). One command on Colab T4 (or **[open the reproducible notebook β†—](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing)**) runs the entire pipeline:
130
 
131
  ```python
132
  os.environ["BASE_MODEL"] = "Qwen/Qwen2.5-1.5B-Instruct"
 
231
  | **Live environment** | [swapnilpatil28-multi-agent-incident-command-center.hf.space](https://swapnilpatil28-multi-agent-incident-command-center.hf.space) (OpenEnv-compatible, Docker-backed) |
232
  | **Training notebook** | [One-click Colab (T4, ~1 h 15 min end-to-end)](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing) |
233
  | **Source + tests** | [GitHub repo (21 passing tests, Dockerfile with HEALTHCHECK)](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center) |
234
+ | **Full docs** | [README β€” Part 1 story + Part 2 technical deep-dive](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/README.md) |
235
+ | **Committed evidence** | [`artifacts/`](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/tree/main/artifacts) β€” all 4 PNGs + both JSON metric files |
236
+ | **Submission checklist** | [`docs/SUBMISSION_CHECKLIST.md`](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/docs/SUBMISSION_CHECKLIST.md) |
237
 
238
  ---
239
 
240
+ ## 8. What's next (Planned)
241
 
242
  - **Replace SFT with GRPO or PPO** using the environment's native reward signal β€” no heuristic teacher, let the rubric itself shape the policy and push past the imitation ceiling.
243
  - **Scale the incident catalog** from 13 templates to 50+ (drop in JSON-defined scenarios).
docs/SUBMISSION_CHECKLIST.md CHANGED
@@ -87,9 +87,9 @@ Status against every hard gate in the official judging rules, plus every polish
87
  | 5 | Dashboard upgraded: hero story panel, 4 stacked plots, resources grid with README / blog / checklist links | βœ… |
88
  | 6 | Blog post updated (`docs/BLOG_POST.md`) with fixed image paths (raw GitHub URLs) and 0.5B ablation section | βœ… |
89
  | 7 | All 21 tests passing on latest commit | βœ… |
90
- | 8 | Run `openenv validate` remotely against the Space β€” `./validate-submission.sh <space-url>` | ⬜ (run it once before the deadline) |
91
- | 9 | **Submit the Space URL in the hackathon form:** `https://swapnilpatil28-multi-agent-incident-command-center.hf.space` | ⬜ |
92
- | 10 | Do not push commits after the submission deadline β€” post-deadline commits won't be considered | ⬜ |
93
 
94
  ---
95
 
@@ -100,13 +100,13 @@ Status against every hard gate in the official judging rules, plus every polish
100
  curl -fsS https://swapnilpatil28-multi-agent-incident-command-center.hf.space/healthz
101
 
102
  # 2. Env-info endpoint advertises metadata
103
- curl -s https://swapnilpatil28-multi-agent-incident-command-center.hf.space/env-info | head -20
104
 
105
  # 3. OpenEnv validator passes remotely
106
  ./validate-submission.sh https://swapnilpatil28-multi-agent-incident-command-center.hf.space
107
 
108
  # 4. A remote episode works
109
- ENV_URL=https://swapnilpatil28-multi-agent-incident-command-center.hf.space python inference.py | head -40
110
  ```
111
 
112
  ## Where the judges will find each artefact
 
87
  | 5 | Dashboard upgraded: hero story panel, 4 stacked plots, resources grid with README / blog / checklist links | βœ… |
88
  | 6 | Blog post updated (`docs/BLOG_POST.md`) with fixed image paths (raw GitHub URLs) and 0.5B ablation section | βœ… |
89
  | 7 | All 21 tests passing on latest commit | βœ… |
90
+ | 8 | Run `openenv validate` remotely against the Space β€” `./validate-submission.sh <space-url>` | βœ… |
91
+ | 9 | **Submit the Space URL in the hackathon form:** `https://swapnilpatil28-multi-agent-incident-command-center.hf.space` | βœ… |
92
+ | 10 | Do not push commits after the submission deadline β€” post-deadline commits won't be considered | βœ… |
93
 
94
  ---
95
 
 
100
  curl -fsS https://swapnilpatil28-multi-agent-incident-command-center.hf.space/healthz
101
 
102
  # 2. Env-info endpoint advertises metadata
103
+ curl -s https://swapnilpatil28-multi-agent-incident-command-center.hf.space/env-info
104
 
105
  # 3. OpenEnv validator passes remotely
106
  ./validate-submission.sh https://swapnilpatil28-multi-agent-incident-command-center.hf.space
107
 
108
  # 4. A remote episode works
109
+ ENV_URL=https://swapnilpatil28-multi-agent-incident-command-center.hf.space python inference.py
110
  ```
111
 
112
  ## Where the judges will find each artefact