Spaces:
Running
Running
Update README.md
#16
by
munals - opened
README.md
CHANGED
|
@@ -7,7 +7,6 @@ sdk: docker
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
# HaramGuard β Agentic AI Safety System for Hajj Crowd Management
|
| 12 |
|
| 13 |
HaramGuard is a real-time, multi-agent decision-support system that integrates computer vision, risk modeling, reflective bias correction, and LLM-based coordination to assist human operators in preventing crowd crush during Hajj and Umrah.
|
|
@@ -70,54 +69,48 @@ Manual monitoring of many camera feeds is error-prone and does not scale. Operat
|
|
| 70 |
|
| 71 |
HaramGuard uses a **deterministic, unidirectional pipeline**:
|
| 72 |
|
| 73 |
-
1. **PerceptionAgent**
|
| 74 |
-
2. **RiskAgent**
|
| 75 |
-
3. **ReflectionAgent** observes `RiskResult` and `FrameResult`, detects four bias patterns (chronic LOW, rising trend ignored, countβrisk mismatch, over-estimation), and corrects risk level/score when needed. All reflections are logged to the database.
|
| 76 |
-
4. **OperationsAgent** emits a `Decision` only when the risk level
|
| 77 |
-
5. **CoordinatorAgent** is invoked
|
| 78 |
-
|
| 79 |
**Single state:** One `state` dictionary is updated each frame and exposed to the FastAPI server; the React dashboard polls it. All numeric thresholds and caps live in `backend/config.py`.
|
| 80 |
-
|
| 81 |
### 3.2 Data Flow
|
| 82 |
-
|
| 83 |
- **Input:** Video frames from a file or camera (path set by `VIDEO_PATH` in config).
|
| 84 |
- **Output:** Per-frame state (frame_id, person_count, density_score, risk_score, risk_level, trend, latest_decision, coordinator_plan, arabic_alert, reflection_summary, risk_history, decisions_log, etc.) plus persisted records in SQLite: `risk_events`, `op_decisions`, `coordinator_plans`, `reflection_log`.
|
| 85 |
- **Interfaces:** Backend: FastAPI on port 8000 (configurable via `API_PORT`). Frontend: Vite dev server (e.g. port 5173), configurable via `VITE_API_BASE_URL` for the API base URL.
|
| 86 |
-
|
| 87 |
---
|
| 88 |
-
|
| 89 |
## 4. Agentic System Design (Agents Description)
|
| 90 |
-
|
| 91 |
The pipeline runs five agents in order each frame. Each agent is implemented in a single module under `backend/agents/`. Data flows unidirectionally; agents do not call each other directly.
|
| 92 |
-
|
| 93 |
### 4.1 PerceptionAgent (`perception_agent.py`)
|
| 94 |
|
| 95 |
-
- **Role:**
|
| 96 |
-
- **
|
| 97 |
-
- **
|
| 98 |
-
- **
|
| 99 |
-
|
| 100 |
### 4.2 RiskAgent (`risk_agent.py`)
|
| 101 |
|
| 102 |
-
- **Role:**
|
| 103 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
- **Input:** `FrameResult`. **Output:** `RiskResult` (frame_id, risk_score, risk_level, trend, level_changed, window_avg, window_max, density_ema, density_pct).
|
| 105 |
-
|
| 106 |
### 4.3 ReflectionAgent (`reflection_agent.py`)
|
| 107 |
|
| 108 |
-
- **Role:** Critique the current risk assessment and correct it when one of four bias patterns is detected: (1) chronic LOW β N consecutive LOW frames with average person count above threshold β upgrade to MEDIUM; (2) rising trend ignored β trend=rising, risk=LOW, count above threshold β upgrade to MEDIUM; (3) countβrisk mismatch β high person count but LOW risk β upgrade to MEDIUM or HIGH; (4) over-estimation β HIGH risk but person count below threshold (
|
| 109 |
- **Design pattern:** Reflection (observe β critique β correct β log). History window and thresholds in config (`REFLECTION_BIAS_WINDOW`, `REFLECTION_CROWD_LOW_THRESH`, `REFLECTION_HIGH_CROWD_THRESH`, `REFLECTION_OVER_EST_THRESH`).
|
| 110 |
- **Input:** `RiskResult`, `FrameResult`. **Output:** Reflection dict; pipeline applies corrections to `RiskResult` before passing to OperationsAgent.
|
| 111 |
|
| 112 |
### 4.4 OperationsAgent (`operations_agent.py`)
|
| 113 |
-
|
| 114 |
-
- **Role:** Map the (possibly reflection-corrected) risk level to an operational priority (P0 / P1 / P2) and emit a `Decision` only when the risk level *changes*. Priority is derived from config (`OPS_P0_SCORE`, `OPS_P1_SCORE`) aligned with RiskAgent thresholds. P0 emission is rate-limited per zone (cooldown 300 s in agent class). The decision's actions and selected_gates are left empty; the pipeline fills them via CoordinatorAgent and then stores the decision in `op_decisions`.
|
| 115 |
- **Design pattern:** Event-driven; no decision when level unchanged.
|
| 116 |
- **Input:** `RiskResult`, context string (e.g. `Mecca_Main_Area`). **Output:** `Decision` or None.
|
| 117 |
-
|
| 118 |
### 4.5 CoordinatorAgent (`coordinator_agent.py`)
|
| 119 |
|
| 120 |
-
- **Role:** For every decision (P0, P1, or P2), produce a structured action plan using the Groq LLM (
|
| 121 |
- **Design pattern:** ReAct (reason β act β observe) + output guardrails.
|
| 122 |
- **Input:** `RiskResult`, `Decision`, list of recent `FrameResult`s. **Output:** Plan dict.
|
| 123 |
|
|
@@ -148,19 +141,19 @@ Guardrails are hard constraints and validations applied in code to keep outputs
|
|
| 148 |
| --- | --- | --- | --- |
|
| 149 |
| **GR1** | PerceptionAgent | Person count capped at MAX_PERSONS (1000 in agent) | Prevents implausibly high counts from YOLO artifacts from propagating to risk and alerts. |
|
| 150 |
| **GR2** | PerceptionAgent | Density score capped at MAX_DENSITY (50.0) | Keeps density in a bounded range for downstream risk formulas. |
|
| 151 |
-
| **GR3** | RiskAgent | Risk score clamped to [0.0, 1.0] | Ensures threshold comparisons (
|
| 152 |
-
| **
|
|
|
|
| 153 |
| **GR-C1** | CoordinatorAgent | Required JSON fields enforced; missing set to safe defaults | Prevents dashboard or downstream logic from breaking when the LLM omits fields. |
|
| 154 |
| **GR-C2** | CoordinatorAgent | threat_level whitelist (CRITICAL, HIGH, MEDIUM, LOW) | Avoids invalid or adversarial values that would break UI or logic. |
|
| 155 |
| **GR-C3** | CoordinatorAgent | confidence_score in [0, 1]; otherwise 0.5 | Normalizes LLM output so confidence is comparable. |
|
| 156 |
-
| **GR-C4** | CoordinatorAgent | Full range enforcement: threat_level overridden to match actual risk_score thresholds (
|
| 157 |
| **GR-C5** | CoordinatorAgent | Arabic alert fallback if empty | Ensures safety-critical Arabic alert is never empty on the dashboard. |
|
| 158 |
| **GR-C6** | CoordinatorAgent | selected_gates must be non-empty list; otherwise fallback | Ensures operators receive concrete gate recommendations. |
|
| 159 |
| **RF1** | ReflectionAgent | Chronic LOW bias: N consecutive LOW with avg count above threshold β MEDIUM | Addresses sliding-window lag during rapid escalation. |
|
| 160 |
| **RF2** | ReflectionAgent | Rising trend ignored: trend=rising, LOW, count above threshold β MEDIUM | Corrects inconsistent state (rising crowd with LOW risk). |
|
| 161 |
| **RF3** | ReflectionAgent | Countβrisk mismatch: high count but LOW risk β upgrade to MEDIUM/HIGH | Corrects mathematically inconsistent states. |
|
| 162 |
| **RF4** | ReflectionAgent | Over-estimation: HIGH risk but count < threshold (e.g. 15) β MEDIUM | Reduces false HIGH from empty or near-empty frames. |
|
| 163 |
-
|
| 164 |
Each guardrail is implemented in the corresponding agent file; further justification is documented in `ethics_and_safety_report.txt`.
|
| 165 |
|
| 166 |
---
|
|
@@ -237,7 +230,7 @@ source venv/bin/activate # Windows: venv\Scripts\activate
|
|
| 237 |
pip install -r requirements.txt
|
| 238 |
```
|
| 239 |
|
| 240 |
-
Set `GROQ_API_KEY` in the environment or in `backend/config.py`. Set `VIDEO_PATH` to a valid video file path (default: `
|
| 241 |
|
| 242 |
```bash
|
| 243 |
python api.py
|
|
@@ -419,31 +412,19 @@ Rationale: in crowd safety, a false alarm is far less costly than a missed stamp
|
|
| 419 |
| System failure during peak crowd | Pipeline crash β operators lose visibility | Fail-safe per-agent isolation; operators trained to treat silence as trigger for manual monitoring |
|
| 420 |
| Adversarial prompt injection | Malicious input manipulates LLM output | Structured JSON-only output; GR-C1βC5 guardrails validate every field; threat_level constrained to whitelist |
|
| 421 |
| Disproportionate security response | P0 triggers aggressive enforcement harming pilgrims | Playbook actions are crowd-management only (open gates, PA broadcast, crowd guides) β not enforcement; human operator has final authority |
|
| 422 |
-
|
| 423 |
### 11.7 Fail-Safe Behavior
|
| 424 |
-
|
| 425 |
If any agent fails, the pipeline continues with safe defaults:
|
| 426 |
-
|
| 427 |
- **PerceptionAgent fails** β returns empty FrameResult (count=0)
|
| 428 |
- **RiskAgent fails** β previous risk level is retained
|
| 429 |
- **VisionCountAgent fails** β falls back to YOLO count (logged in flags)
|
| 430 |
- **CoordinatorAgent fails** β P0 decision still issued without GPT plan
|
| 431 |
- **DB write fails** β logged to console, pipeline continues
|
| 432 |
-
|
| 433 |
---
|
| 434 |
-
|
| 435 |
## 12. Limitations & Future Work
|
| 436 |
-
|
| 437 |
- **Single-camera, single-zone:** The pipeline processes one video stream per instance. Real deployment at the Grand Mosque would require multiple cameras and zones. Future work: multi-zone state and one pipeline per camera with aggregation at the API or dashboard layer.
|
| 438 |
-
|
| 439 |
- **Synthetic-only evaluation:** Quantitative metrics in `evaluation.py` are computed on synthetically generated scenarios with known ground-truth counts. Real aerial footage has occlusion, blur, and lighting conditions not fully captured. Future work: annotate real frames and measure real-world accuracy and recall.
|
| 440 |
-
|
| 441 |
- **No Hajj-specific fine-tuning:** The YOLO model is pre-trained on general datasets. Pilgrims in white ihram can be under-detected (domain shift). VisionCountAgent and ReflectionAgent mitigate this in part when enabled. Future work: fine-tune a detector on Hajj-annotated data to improve recall (estimated +15%).
|
| 442 |
-
|
| 443 |
- **Coordinator output quality not automatically measured:** Evaluation covers risk levels, priorities, and guardrail compliance β not the appropriateness or clarity of generated Arabic plans and alerts. Future work: human-expert rubric and sample-based evaluation of plan quality.
|
| 444 |
-
|
| 445 |
- **Production scaling:** The architecture supports running multiple pipeline instances; the dashboard and API would need to be extended for per-zone or per-camera state and approve/reject controls per zone.
|
| 446 |
-
|
| 447 |
---
|
| 448 |
-
|
| 449 |
**HaramGuard β Capstone Project Β· Tuwaiq Academy**
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
|
|
|
| 10 |
# HaramGuard β Agentic AI Safety System for Hajj Crowd Management
|
| 11 |
|
| 12 |
HaramGuard is a real-time, multi-agent decision-support system that integrates computer vision, risk modeling, reflective bias correction, and LLM-based coordination to assist human operators in preventing crowd crush during Hajj and Umrah.
|
|
|
|
| 69 |
|
| 70 |
HaramGuard uses a **deterministic, unidirectional pipeline**:
|
| 71 |
|
| 72 |
+
1. **PerceptionAgent** receives each video frame with its detection data and produces a structured `FrameResult` (person count, density score, bounding boxes, track IDs, annotated frame, 3Γ3 spatial grid for hotspot detection). Uses a YOLOv8 Head Detection model (CrowdHuman) with BoTSORT tracking.
|
| 73 |
+
2. **RiskAgent** segments the video into clips (detected via count/density jumps) and computes a dynamic risk score every frame using a sliding K-window (17 frames) of unique track IDs: `density_pct = N_est / 150 Γ 100`, `risk_score = density_pct / 100`. Level stabilization requires 5 consecutive frames at a new level before confirming. Output is `RiskResult`.
|
| 74 |
+
3. **ReflectionAgent** observes `RiskResult` and `FrameResult`, detects four bias patterns (chronic LOW, rising trend ignored, countβrisk mismatch, over-estimation), and corrects risk level/score when needed. Level-changed tracking is recalculated after correction so OperationsAgent always sees the effective level. All reflections are logged to the database.
|
| 75 |
+
4. **OperationsAgent** emits a `Decision` only when the effective risk level changes (event-driven, post-reflection). Priority (P0/P1/P2) uses density-aligned thresholds (P0: score > 0.80, P1: score > 0.20). P0 rate-limit resets on each pipeline restart. Decisions are stored in the database.
|
| 76 |
+
5. **CoordinatorAgent** is invoked for **every** decision (P0, P1, P2) in a background thread so it does not block frame processing. Calls Groq LLM (`openai/gpt-oss-120b`) via a ReAct loop (reason β act β observe, max 3 iterations) with six guardrails (GR-C1βC6). The pipeline fills the decision's actions, justification, and selected_gates from the plan.
|
|
|
|
| 77 |
**Single state:** One `state` dictionary is updated each frame and exposed to the FastAPI server; the React dashboard polls it. All numeric thresholds and caps live in `backend/config.py`.
|
|
|
|
| 78 |
### 3.2 Data Flow
|
|
|
|
| 79 |
- **Input:** Video frames from a file or camera (path set by `VIDEO_PATH` in config).
|
| 80 |
- **Output:** Per-frame state (frame_id, person_count, density_score, risk_score, risk_level, trend, latest_decision, coordinator_plan, arabic_alert, reflection_summary, risk_history, decisions_log, etc.) plus persisted records in SQLite: `risk_events`, `op_decisions`, `coordinator_plans`, `reflection_log`.
|
| 81 |
- **Interfaces:** Backend: FastAPI on port 8000 (configurable via `API_PORT`). Frontend: Vite dev server (e.g. port 5173), configurable via `VITE_API_BASE_URL` for the API base URL.
|
|
|
|
| 82 |
---
|
|
|
|
| 83 |
## 4. Agentic System Design (Agents Description)
|
|
|
|
| 84 |
The pipeline runs five agents in order each frame. Each agent is implemented in a single module under `backend/agents/`. Data flows unidirectionally; agents do not call each other directly.
|
|
|
|
| 85 |
### 4.1 PerceptionAgent (`perception_agent.py`)
|
| 86 |
|
| 87 |
+
- **Role:** Receive each video frame alongside its pre-computed detection data and produce a structured `FrameResult`: person count, density score, average spacing, bounding boxes, annotated frame, track IDs, occupation percentage, and a 3Γ3 spatial grid (grid_counts, grid_max, hotspot_zone) for downstream hotspot detection. Based on Umm Al-Qura University (UQU) Haram crowd research: local clustering in one cell can indicate risk even when global count is moderate.
|
| 88 |
+
- **Model:** YOLOv8 Head Detection trained on the CrowdHuman dataset, paired with BoTSORT tracker β specifically chosen for dense crowd scenes where full-body detection fails due to occlusion.
|
| 89 |
+
- **Design pattern:** Tool use β head detection model for detection; BoTSORT for multi-object tracking across frames. Optional VisionCountAgent (Claude Vision) available but not used in the default pipeline.
|
| 90 |
+
- **Guardrails:** GR1 β person count capped at `MAX_PERSONS`. GR2 β density score capped at `MAX_DENSITY` (50.0).
|
| 91 |
+
- **Input:** Video frame (numpy array) + detection metadata. **Output:** `FrameResult` (see `backend/core/models.py`).
|
| 92 |
### 4.2 RiskAgent (`risk_agent.py`)
|
| 93 |
|
| 94 |
+
- **Role:** Segment the video into clips and compute a dynamic risk score per frame using a sliding K-window of unique track IDs. Outputs a scalar risk score in [0, 1], a discrete risk level (LOW / MEDIUM / HIGH), and a density percentage.
|
| 95 |
+
- **Clip segmentation:** A new clip is detected when `|persons[t] β persons[tβ1]| β₯ 40` OR `|density_score[t] β density_score[tβ1]| β₯ 0.4`, sustained for β₯ 10 consecutive frames (glitch filter). All state resets at each clip boundary.
|
| 96 |
+
- **Sliding K-window density:** Every frame, the agent unions the unique track IDs seen across the last K=17 frames within the current clip: `N_est = |union(track_ids)|`. Density: `density_pct = min(N_est / 150 Γ 100, 100)`. Risk score: `risk_score = density_pct / 100`.
|
| 97 |
+
- **Risk levels:** density_pct β€ 20 β LOW; 21β80 β MEDIUM; > 80 β HIGH.
|
| 98 |
+
- **Level stabilization:** A new level must hold for 5 consecutive frames before it is confirmed and `level_changed` fires. `level_changed` is suppressed for the first K frames of each clip (warmup period) to prevent false triggers before the window is full.
|
| 99 |
+
- **Design pattern:** Clip-aware sliding window. All thresholds in `config.py` (`CLIP_P_JUMP`, `CLIP_D_JUMP`, `CLIP_MIN_LEN`, `CLIP_K_WINDOW`, `RISK_HIGH_THRESHOLD`).
|
| 100 |
- **Input:** `FrameResult`. **Output:** `RiskResult` (frame_id, risk_score, risk_level, trend, level_changed, window_avg, window_max, density_ema, density_pct).
|
|
|
|
| 101 |
### 4.3 ReflectionAgent (`reflection_agent.py`)
|
| 102 |
|
| 103 |
+
- **Role:** Critique the current risk assessment and correct it when one of four bias patterns is detected: (1) chronic LOW β N consecutive LOW frames with average person count above threshold β upgrade to MEDIUM; (2) rising trend ignored β trend=rising, risk=LOW, count above threshold β upgrade to MEDIUM; (3) countβrisk mismatch β high person count but LOW risk β upgrade to MEDIUM or HIGH; (4) over-estimation β HIGH risk but person count below threshold (< 10) β downgrade to MEDIUM. Score corrections are aligned to the new risk thresholds (>0.20 MEDIUM, >0.80 HIGH). All reflections are persisted to `reflection_log` by the pipeline.
|
| 104 |
- **Design pattern:** Reflection (observe β critique β correct β log). History window and thresholds in config (`REFLECTION_BIAS_WINDOW`, `REFLECTION_CROWD_LOW_THRESH`, `REFLECTION_HIGH_CROWD_THRESH`, `REFLECTION_OVER_EST_THRESH`).
|
| 105 |
- **Input:** `RiskResult`, `FrameResult`. **Output:** Reflection dict; pipeline applies corrections to `RiskResult` before passing to OperationsAgent.
|
| 106 |
|
| 107 |
### 4.4 OperationsAgent (`operations_agent.py`)
|
| 108 |
+
- **Role:** Map the (possibly reflection-corrected) risk level to an operational priority (P0 / P1 / P2) and emit a `Decision` only when the risk level *changes*. Priority thresholds aligned to density-based risk score: `OPS_P0_SCORE = 0.80` (HIGH), `OPS_P1_SCORE = 0.20` (MEDIUM). P0 emission is rate-limited per zone (cooldown 300 s), but the rate-limit resets with each pipeline restart so previous runs do not block new decisions. The decision's actions and selected_gates are left empty; the pipeline fills them via CoordinatorAgent and then stores the decision in `op_decisions`.
|
|
|
|
| 109 |
- **Design pattern:** Event-driven; no decision when level unchanged.
|
| 110 |
- **Input:** `RiskResult`, context string (e.g. `Mecca_Main_Area`). **Output:** `Decision` or None.
|
|
|
|
| 111 |
### 4.5 CoordinatorAgent (`coordinator_agent.py`)
|
| 112 |
|
| 113 |
+
- **Role:** For every decision (P0, P1, or P2), produce a structured action plan using the Groq LLM (`openai/gpt-oss-120b`). Plan includes threat_level, executive_summary, selected_gates, immediate_actions, actions_justification, arabic_alert, confidence_score. Implements a ReAct loop (max 3 iterations): reason (build prompt from RiskResult, Decision, frame buffer; optional feedback from failed validation) β act (LLM call, parse JSON) β observe (run guardrails GR-C1βC6); repeat until valid or max iterations. GR-C4 enforces that threat_level matches actual risk_score thresholds (>0.80 β HIGH, >0.20 β MEDIUM). Pipeline fills the decision's actions, justification, and selected_gates from the plan and stores the plan in `coordinator_plans`. LLM runs in a background thread so it does not block frame processing.
|
| 114 |
- **Design pattern:** ReAct (reason β act β observe) + output guardrails.
|
| 115 |
- **Input:** `RiskResult`, `Decision`, list of recent `FrameResult`s. **Output:** Plan dict.
|
| 116 |
|
|
|
|
| 141 |
| --- | --- | --- | --- |
|
| 142 |
| **GR1** | PerceptionAgent | Person count capped at MAX_PERSONS (1000 in agent) | Prevents implausibly high counts from YOLO artifacts from propagating to risk and alerts. |
|
| 143 |
| **GR2** | PerceptionAgent | Density score capped at MAX_DENSITY (50.0) | Keeps density in a bounded range for downstream risk formulas. |
|
| 144 |
+
| **GR3** | RiskAgent | Risk score clamped to [0.0, 1.0] | Ensures threshold comparisons (>0.20 MEDIUM, >0.80 HIGH) remain valid. |
|
| 145 |
+
| **GR3b** | RiskAgent | `level_changed` suppressed during K-window warmup (first 17 frames per clip) | Prevents false P0/P1 triggers before enough track data is available. |
|
| 146 |
+
| **GR4** | OperationsAgent | P0 rate-limited per zone (cooldown 300 s); resets on pipeline restart | Reduces alert fatigue; rate-limit does not carry over across runs. |
|
| 147 |
| **GR-C1** | CoordinatorAgent | Required JSON fields enforced; missing set to safe defaults | Prevents dashboard or downstream logic from breaking when the LLM omits fields. |
|
| 148 |
| **GR-C2** | CoordinatorAgent | threat_level whitelist (CRITICAL, HIGH, MEDIUM, LOW) | Avoids invalid or adversarial values that would break UI or logic. |
|
| 149 |
| **GR-C3** | CoordinatorAgent | confidence_score in [0, 1]; otherwise 0.5 | Normalizes LLM output so confidence is comparable. |
|
| 150 |
+
| **GR-C4** | CoordinatorAgent | Full range enforcement: threat_level overridden to match actual risk_score thresholds (>0.80 HIGH, >0.20 MEDIUM, else LOW) | Prevents LLM from returning HIGH threat during MEDIUM risk or CRITICAL during LOW risk. |
|
| 151 |
| **GR-C5** | CoordinatorAgent | Arabic alert fallback if empty | Ensures safety-critical Arabic alert is never empty on the dashboard. |
|
| 152 |
| **GR-C6** | CoordinatorAgent | selected_gates must be non-empty list; otherwise fallback | Ensures operators receive concrete gate recommendations. |
|
| 153 |
| **RF1** | ReflectionAgent | Chronic LOW bias: N consecutive LOW with avg count above threshold β MEDIUM | Addresses sliding-window lag during rapid escalation. |
|
| 154 |
| **RF2** | ReflectionAgent | Rising trend ignored: trend=rising, LOW, count above threshold β MEDIUM | Corrects inconsistent state (rising crowd with LOW risk). |
|
| 155 |
| **RF3** | ReflectionAgent | Countβrisk mismatch: high count but LOW risk β upgrade to MEDIUM/HIGH | Corrects mathematically inconsistent states. |
|
| 156 |
| **RF4** | ReflectionAgent | Over-estimation: HIGH risk but count < threshold (e.g. 15) β MEDIUM | Reduces false HIGH from empty or near-empty frames. |
|
|
|
|
| 157 |
Each guardrail is implemented in the corresponding agent file; further justification is documented in `ethics_and_safety_report.txt`.
|
| 158 |
|
| 159 |
---
|
|
|
|
| 230 |
pip install -r requirements.txt
|
| 231 |
```
|
| 232 |
|
| 233 |
+
Set `GROQ_API_KEY` in the environment or in `backend/config.py`. Set `VIDEO_PATH` to a valid video file path (default: `hajj_multi_video_annotated.mp4` in the backend directory). The system uses a YOLOv8 Head Detection model (CrowdHuman) with BoTSORT tracking.
|
| 234 |
|
| 235 |
```bash
|
| 236 |
python api.py
|
|
|
|
| 412 |
| System failure during peak crowd | Pipeline crash β operators lose visibility | Fail-safe per-agent isolation; operators trained to treat silence as trigger for manual monitoring |
|
| 413 |
| Adversarial prompt injection | Malicious input manipulates LLM output | Structured JSON-only output; GR-C1βC5 guardrails validate every field; threat_level constrained to whitelist |
|
| 414 |
| Disproportionate security response | P0 triggers aggressive enforcement harming pilgrims | Playbook actions are crowd-management only (open gates, PA broadcast, crowd guides) β not enforcement; human operator has final authority |
|
|
|
|
| 415 |
### 11.7 Fail-Safe Behavior
|
|
|
|
| 416 |
If any agent fails, the pipeline continues with safe defaults:
|
|
|
|
| 417 |
- **PerceptionAgent fails** β returns empty FrameResult (count=0)
|
| 418 |
- **RiskAgent fails** β previous risk level is retained
|
| 419 |
- **VisionCountAgent fails** β falls back to YOLO count (logged in flags)
|
| 420 |
- **CoordinatorAgent fails** β P0 decision still issued without GPT plan
|
| 421 |
- **DB write fails** β logged to console, pipeline continues
|
|
|
|
| 422 |
---
|
|
|
|
| 423 |
## 12. Limitations & Future Work
|
|
|
|
| 424 |
- **Single-camera, single-zone:** The pipeline processes one video stream per instance. Real deployment at the Grand Mosque would require multiple cameras and zones. Future work: multi-zone state and one pipeline per camera with aggregation at the API or dashboard layer.
|
|
|
|
| 425 |
- **Synthetic-only evaluation:** Quantitative metrics in `evaluation.py` are computed on synthetically generated scenarios with known ground-truth counts. Real aerial footage has occlusion, blur, and lighting conditions not fully captured. Future work: annotate real frames and measure real-world accuracy and recall.
|
|
|
|
| 426 |
- **No Hajj-specific fine-tuning:** The YOLO model is pre-trained on general datasets. Pilgrims in white ihram can be under-detected (domain shift). VisionCountAgent and ReflectionAgent mitigate this in part when enabled. Future work: fine-tune a detector on Hajj-annotated data to improve recall (estimated +15%).
|
|
|
|
| 427 |
- **Coordinator output quality not automatically measured:** Evaluation covers risk levels, priorities, and guardrail compliance β not the appropriateness or clarity of generated Arabic plans and alerts. Future work: human-expert rubric and sample-based evaluation of plan quality.
|
|
|
|
| 428 |
- **Production scaling:** The architecture supports running multiple pipeline instances; the dashboard and API would need to be extended for per-zone or per-camera state and approve/reject controls per zone.
|
|
|
|
| 429 |
---
|
|
|
|
| 430 |
**HaramGuard β Capstone Project Β· Tuwaiq Academy**
|