Final Docs Update
Browse files- docs/BLOG_POST.md +5 -5
- docs/SUBMISSION_CHECKLIST.md +5 -5
docs/BLOG_POST.md
CHANGED
|
@@ -126,7 +126,7 @@ The reward engine emits **named components** at every step so training curves
|
|
| 126 |
|
| 127 |
I first wrote a deterministic `HeuristicCoordinator` that uses the observation's `investigation_targets` and role constraints to play through the environment. On hard tasks it earns **+5.89** reward where random scores **β12.50** β so that gives us ~680 `(prompt, completion)` pairs of "good" behavior to imitate.
|
| 128 |
|
| 129 |
-
Training script: [`train_trl.py`](https://
|
| 130 |
|
| 131 |
```python
|
| 132 |
os.environ["BASE_MODEL"] = "Qwen/Qwen2.5-1.5B-Instruct"
|
|
@@ -231,13 +231,13 @@ I ran the exact same pipeline with the smaller **Qwen2.5-0.5B-Instruct** backbon
|
|
| 231 |
| **Live environment** | [swapnilpatil28-multi-agent-incident-command-center.hf.space](https://swapnilpatil28-multi-agent-incident-command-center.hf.space) (OpenEnv-compatible, Docker-backed) |
|
| 232 |
| **Training notebook** | [One-click Colab (T4, ~1 h 15 min end-to-end)](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing) |
|
| 233 |
| **Source + tests** | [GitHub repo (21 passing tests, Dockerfile with HEALTHCHECK)](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center) |
|
| 234 |
-
| **Full docs** | [README β Part 1 story + Part 2 technical deep-dive](https://
|
| 235 |
-
| **Committed evidence** | [`artifacts/`](https://
|
| 236 |
-
| **Submission checklist** | [`docs/SUBMISSION_CHECKLIST.md`](https://
|
| 237 |
|
| 238 |
---
|
| 239 |
|
| 240 |
-
## 8. What's next
|
| 241 |
|
| 242 |
- **Replace SFT with GRPO or PPO** using the environment's native reward signal β no heuristic teacher, let the rubric itself shape the policy and push past the imitation ceiling.
|
| 243 |
- **Scale the incident catalog** from 13 templates to 50+ (drop in JSON-defined scenarios).
|
|
|
|
| 126 |
|
| 127 |
I first wrote a deterministic `HeuristicCoordinator` that uses the observation's `investigation_targets` and role constraints to play through the environment. On hard tasks it earns **+5.89** reward where random scores **β12.50** β so that gives us ~680 `(prompt, completion)` pairs of "good" behavior to imitate.
|
| 128 |
|
| 129 |
+
Training script: [`train_trl.py`](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/train_trl.py). One command on Colab T4 (or **[open the reproducible notebook β](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing)**) runs the entire pipeline:
|
| 130 |
|
| 131 |
```python
|
| 132 |
os.environ["BASE_MODEL"] = "Qwen/Qwen2.5-1.5B-Instruct"
|
|
|
|
| 231 |
| **Live environment** | [swapnilpatil28-multi-agent-incident-command-center.hf.space](https://swapnilpatil28-multi-agent-incident-command-center.hf.space) (OpenEnv-compatible, Docker-backed) |
|
| 232 |
| **Training notebook** | [One-click Colab (T4, ~1 h 15 min end-to-end)](https://colab.research.google.com/drive/1vx9E5FrZZrHoRwXs2cvtom3DaI6kZ3LP?usp=sharing) |
|
| 233 |
| **Source + tests** | [GitHub repo (21 passing tests, Dockerfile with HEALTHCHECK)](https://github.com/SwapnilPatil28/Multi-Agent-Incident-Command-Center) |
|
| 234 |
+
| **Full docs** | [README β Part 1 story + Part 2 technical deep-dive](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/README.md) |
|
| 235 |
+
| **Committed evidence** | [`artifacts/`](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/tree/main/artifacts) β all 4 PNGs + both JSON metric files |
|
| 236 |
+
| **Submission checklist** | [`docs/SUBMISSION_CHECKLIST.md`](https://huggingface.co/spaces/SwapnilPatil28/Multi-Agent-Incident-Command-Center/blob/main/docs/SUBMISSION_CHECKLIST.md) |
|
| 237 |
|
| 238 |
---
|
| 239 |
|
| 240 |
+
## 8. What's next (Planned)
|
| 241 |
|
| 242 |
- **Replace SFT with GRPO or PPO** using the environment's native reward signal β no heuristic teacher, let the rubric itself shape the policy and push past the imitation ceiling.
|
| 243 |
- **Scale the incident catalog** from 13 templates to 50+ (drop in JSON-defined scenarios).
|
docs/SUBMISSION_CHECKLIST.md
CHANGED
|
@@ -87,9 +87,9 @@ Status against every hard gate in the official judging rules, plus every polish
|
|
| 87 |
| 5 | Dashboard upgraded: hero story panel, 4 stacked plots, resources grid with README / blog / checklist links | β
|
|
| 88 |
| 6 | Blog post updated (`docs/BLOG_POST.md`) with fixed image paths (raw GitHub URLs) and 0.5B ablation section | β
|
|
| 89 |
| 7 | All 21 tests passing on latest commit | β
|
|
| 90 |
-
| 8 | Run `openenv validate` remotely against the Space β `./validate-submission.sh <space-url>` |
|
| 91 |
-
| 9 | **Submit the Space URL in the hackathon form:** `https://swapnilpatil28-multi-agent-incident-command-center.hf.space` |
|
| 92 |
-
| 10 | Do not push commits after the submission deadline β post-deadline commits won't be considered |
|
| 93 |
|
| 94 |
---
|
| 95 |
|
|
@@ -100,13 +100,13 @@ Status against every hard gate in the official judging rules, plus every polish
|
|
| 100 |
curl -fsS https://swapnilpatil28-multi-agent-incident-command-center.hf.space/healthz
|
| 101 |
|
| 102 |
# 2. Env-info endpoint advertises metadata
|
| 103 |
-
curl -s https://swapnilpatil28-multi-agent-incident-command-center.hf.space/env-info
|
| 104 |
|
| 105 |
# 3. OpenEnv validator passes remotely
|
| 106 |
./validate-submission.sh https://swapnilpatil28-multi-agent-incident-command-center.hf.space
|
| 107 |
|
| 108 |
# 4. A remote episode works
|
| 109 |
-
ENV_URL=https://swapnilpatil28-multi-agent-incident-command-center.hf.space python inference.py
|
| 110 |
```
|
| 111 |
|
| 112 |
## Where the judges will find each artefact
|
|
|
|
| 87 |
| 5 | Dashboard upgraded: hero story panel, 4 stacked plots, resources grid with README / blog / checklist links | β
|
|
| 88 |
| 6 | Blog post updated (`docs/BLOG_POST.md`) with fixed image paths (raw GitHub URLs) and 0.5B ablation section | β
|
|
| 89 |
| 7 | All 21 tests passing on latest commit | β
|
|
| 90 |
+
| 8 | Run `openenv validate` remotely against the Space β `./validate-submission.sh <space-url>` | β
|
|
| 91 |
+
| 9 | **Submit the Space URL in the hackathon form:** `https://swapnilpatil28-multi-agent-incident-command-center.hf.space` | β
|
|
| 92 |
+
| 10 | Do not push commits after the submission deadline β post-deadline commits won't be considered | β
|
|
| 93 |
|
| 94 |
---
|
| 95 |
|
|
|
|
| 100 |
curl -fsS https://swapnilpatil28-multi-agent-incident-command-center.hf.space/healthz
|
| 101 |
|
| 102 |
# 2. Env-info endpoint advertises metadata
|
| 103 |
+
curl -s https://swapnilpatil28-multi-agent-incident-command-center.hf.space/env-info
|
| 104 |
|
| 105 |
# 3. OpenEnv validator passes remotely
|
| 106 |
./validate-submission.sh https://swapnilpatil28-multi-agent-incident-command-center.hf.space
|
| 107 |
|
| 108 |
# 4. A remote episode works
|
| 109 |
+
ENV_URL=https://swapnilpatil28-multi-agent-incident-command-center.hf.space python inference.py
|
| 110 |
```
|
| 111 |
|
| 112 |
## Where the judges will find each artefact
|