Update README.md

#16
by munals - opened
Files changed (1) hide show
  1. README.md +24 -43
README.md CHANGED
@@ -7,7 +7,6 @@ sdk: docker
7
  pinned: false
8
  ---
9
 
10
-
11
  # HaramGuard β€” Agentic AI Safety System for Hajj Crowd Management
12
 
13
  HaramGuard is a real-time, multi-agent decision-support system that integrates computer vision, risk modeling, reflective bias correction, and LLM-based coordination to assist human operators in preventing crowd crush during Hajj and Umrah.
@@ -70,54 +69,48 @@ Manual monitoring of many camera feeds is error-prone and does not scale. Operat
70
 
71
  HaramGuard uses a **deterministic, unidirectional pipeline**:
72
 
73
- 1. **PerceptionAgent** turns each video frame into a structured `FrameResult` (person count, density, spacing, 3Γ—3 spatial grid for hotspots, annotated frame). It uses a YOLO model (path and size in `backend/config.py`); an optional VisionCountAgent (Claude Vision) is available in code but disabled in the default pipeline.
74
- 2. **RiskAgent** maintains a sliding window of `FrameResult`s and computes a risk score and level (LOW/MEDIUM/HIGH) using four paths: Fruin-style EMA of count, instant high-count floor, pre-emptive rate-of-change, and spatial clustering from the 3Γ—3 grid. Output is `RiskResult`.
75
- 3. **ReflectionAgent** observes `RiskResult` and `FrameResult`, detects four bias patterns (chronic LOW, rising trend ignored, count–risk mismatch, over-estimation), and corrects risk level/score when needed. All reflections are logged to the database.
76
- 4. **OperationsAgent** emits a `Decision` only when the risk level *changes* (event-driven). Priority (P0/P1/P2) is derived from config thresholds aligned with RiskAgent. P0 decisions are rate-limited per zone. Decisions are stored in the database; actions and selected gates are left empty until the Coordinator fills them.
77
- 5. **CoordinatorAgent** is invoked by the pipeline for **every** decision (P0, P1, P2). It calls an LLM (Groq API) to generate a structured plan (threat level, executive summary, selected gates, immediate actions, Arabic alert, confidence). A ReAct loop (reason β†’ act β†’ observe, max 3 iterations) validates output with six guardrails (GR-C1–C6); the pipeline then fills the decision's actions, justification, and selected_gates from the plan and stores the plan in the database.
78
-
79
  **Single state:** One `state` dictionary is updated each frame and exposed to the FastAPI server; the React dashboard polls it. All numeric thresholds and caps live in `backend/config.py`.
80
-
81
  ### 3.2 Data Flow
82
-
83
  - **Input:** Video frames from a file or camera (path set by `VIDEO_PATH` in config).
84
  - **Output:** Per-frame state (frame_id, person_count, density_score, risk_score, risk_level, trend, latest_decision, coordinator_plan, arabic_alert, reflection_summary, risk_history, decisions_log, etc.) plus persisted records in SQLite: `risk_events`, `op_decisions`, `coordinator_plans`, `reflection_log`.
85
  - **Interfaces:** Backend: FastAPI on port 8000 (configurable via `API_PORT`). Frontend: Vite dev server (e.g. port 5173), configurable via `VITE_API_BASE_URL` for the API base URL.
86
-
87
  ---
88
-
89
  ## 4. Agentic System Design (Agents Description)
90
-
91
  The pipeline runs five agents in order each frame. Each agent is implemented in a single module under `backend/agents/`. Data flows unidirectionally; agents do not call each other directly.
92
-
93
  ### 4.1 PerceptionAgent (`perception_agent.py`)
94
 
95
- - **Role:** Convert a raw video frame into a `FrameResult`: person count, density score, average spacing, bounding boxes, annotated frame, track IDs, occupation percentage, and a 3Γ—3 spatial grid (grid_counts, grid_max, hotspot_zone) for downstream hotspot detection. Based on Umm Al-Qura University (UQU) Haram crowd research: local clustering in one cell can indicate risk even when global count is moderate.
96
- - **Design pattern:** Tool use β€” YOLO for detection and tracking; optional VisionCountAgent (Claude Vision) for an alternative count when an Anthropic key is provided. In the current default pipeline, PerceptionAgent is instantiated without an Anthropic key (YOLO-only).
97
- - **Guardrails:** GR1 β€” person count capped at `MAX_PERSONS` (1000 in agent class). GR2 β€” density score capped at `MAX_DENSITY` (50.0).
98
- - **Input:** Raw frame (numpy array). **Output:** `FrameResult` (see `backend/core/models.py`).
99
-
100
  ### 4.2 RiskAgent (`risk_agent.py`)
101
 
102
- - **Role:** Maintain a sliding window (14 frames) of recent `FrameResult`s and compute a scalar risk score in [0, 1] and a discrete risk level (LOW / MEDIUM / HIGH), plus trend (rising / stable / falling). Final score is the maximum of four paths: (1) Fruin smooth β€” EMA of current person count normalized to `RISK_HIGH_COUNT` (50), with spacing and trend weights; (2) instant floor β€” if current count β‰₯ HIGH_COUNT, score floor 0.70; (3) pre-emptive ROC β€” 5-frame growth and EMA thresholds; (4) spatial clustering β€” if any 3Γ—3 grid cell has β‰₯ `GRID_CELL_HIGH` persons (from FrameResult), floor 0.70. Score is clamped to [0, 1] (GR3).
103
- - **Design pattern:** Sliding window + multi-path weighted scoring. Window size, thresholds, and weights are in `config.py`.
 
 
 
 
104
  - **Input:** `FrameResult`. **Output:** `RiskResult` (frame_id, risk_score, risk_level, trend, level_changed, window_avg, window_max, density_ema, density_pct).
105
-
106
  ### 4.3 ReflectionAgent (`reflection_agent.py`)
107
 
108
- - **Role:** Critique the current risk assessment and correct it when one of four bias patterns is detected: (1) chronic LOW β€” N consecutive LOW frames with average person count above threshold β†’ upgrade to MEDIUM; (2) rising trend ignored β€” trend=rising, risk=LOW, count above threshold β†’ upgrade to MEDIUM; (3) count–risk mismatch β€” high person count but LOW risk β†’ upgrade to MEDIUM or HIGH; (4) over-estimation β€” HIGH risk but person count below threshold (e.g. < 15) β†’ downgrade to MEDIUM. All reflections are persisted to `reflection_log` by the pipeline.
109
  - **Design pattern:** Reflection (observe β†’ critique β†’ correct β†’ log). History window and thresholds in config (`REFLECTION_BIAS_WINDOW`, `REFLECTION_CROWD_LOW_THRESH`, `REFLECTION_HIGH_CROWD_THRESH`, `REFLECTION_OVER_EST_THRESH`).
110
  - **Input:** `RiskResult`, `FrameResult`. **Output:** Reflection dict; pipeline applies corrections to `RiskResult` before passing to OperationsAgent.
111
 
112
  ### 4.4 OperationsAgent (`operations_agent.py`)
113
-
114
- - **Role:** Map the (possibly reflection-corrected) risk level to an operational priority (P0 / P1 / P2) and emit a `Decision` only when the risk level *changes*. Priority is derived from config (`OPS_P0_SCORE`, `OPS_P1_SCORE`) aligned with RiskAgent thresholds. P0 emission is rate-limited per zone (cooldown 300 s in agent class). The decision's actions and selected_gates are left empty; the pipeline fills them via CoordinatorAgent and then stores the decision in `op_decisions`.
115
  - **Design pattern:** Event-driven; no decision when level unchanged.
116
  - **Input:** `RiskResult`, context string (e.g. `Mecca_Main_Area`). **Output:** `Decision` or None.
117
-
118
  ### 4.5 CoordinatorAgent (`coordinator_agent.py`)
119
 
120
- - **Role:** For every decision (P0, P1, or P2), produce a structured action plan using the Groq LLM (model in agent, e.g. `openai/gpt-oss-120b`). Plan includes threat_level, executive_summary, selected_gates, immediate_actions, actions_justification, arabic_alert, confidence_score. Implements a ReAct loop (max 3 iterations): reason (build prompt from RiskResult, Decision, frame buffer; optional feedback from failed validation) β†’ act (LLM call, parse JSON) β†’ observe (run guardrails GR-C1–C6); repeat until valid or max iterations. Pipeline fills the decision's actions, justification, and selected_gates from the plan and stores the plan in `coordinator_plans`.
121
  - **Design pattern:** ReAct (reason β†’ act β†’ observe) + output guardrails.
122
  - **Input:** `RiskResult`, `Decision`, list of recent `FrameResult`s. **Output:** Plan dict.
123
 
@@ -148,19 +141,19 @@ Guardrails are hard constraints and validations applied in code to keep outputs
148
  | --- | --- | --- | --- |
149
  | **GR1** | PerceptionAgent | Person count capped at MAX_PERSONS (1000 in agent) | Prevents implausibly high counts from YOLO artifacts from propagating to risk and alerts. |
150
  | **GR2** | PerceptionAgent | Density score capped at MAX_DENSITY (50.0) | Keeps density in a bounded range for downstream risk formulas. |
151
- | **GR3** | RiskAgent | Risk score clamped to [0.0, 1.0] | Ensures threshold comparisons (e.g. 0.35, 0.65) remain valid. |
152
- | **GR4** | OperationsAgent | P0 rate-limited per zone (cooldown 300 s in agent) | Reduces alert fatigue; risk is still logged; only decision emission is rate-limited. |
 
153
  | **GR-C1** | CoordinatorAgent | Required JSON fields enforced; missing set to safe defaults | Prevents dashboard or downstream logic from breaking when the LLM omits fields. |
154
  | **GR-C2** | CoordinatorAgent | threat_level whitelist (CRITICAL, HIGH, MEDIUM, LOW) | Avoids invalid or adversarial values that would break UI or logic. |
155
  | **GR-C3** | CoordinatorAgent | confidence_score in [0, 1]; otherwise 0.5 | Normalizes LLM output so confidence is comparable. |
156
- | **GR-C4** | CoordinatorAgent | Full range enforcement: threat_level overridden to match actual risk_score thresholds (LOW/MEDIUM/HIGH) | Prevents LLM from returning HIGH threat during MEDIUM risk or CRITICAL during LOW risk. |
157
  | **GR-C5** | CoordinatorAgent | Arabic alert fallback if empty | Ensures safety-critical Arabic alert is never empty on the dashboard. |
158
  | **GR-C6** | CoordinatorAgent | selected_gates must be non-empty list; otherwise fallback | Ensures operators receive concrete gate recommendations. |
159
  | **RF1** | ReflectionAgent | Chronic LOW bias: N consecutive LOW with avg count above threshold β†’ MEDIUM | Addresses sliding-window lag during rapid escalation. |
160
  | **RF2** | ReflectionAgent | Rising trend ignored: trend=rising, LOW, count above threshold β†’ MEDIUM | Corrects inconsistent state (rising crowd with LOW risk). |
161
  | **RF3** | ReflectionAgent | Count–risk mismatch: high count but LOW risk β†’ upgrade to MEDIUM/HIGH | Corrects mathematically inconsistent states. |
162
  | **RF4** | ReflectionAgent | Over-estimation: HIGH risk but count < threshold (e.g. 15) β†’ MEDIUM | Reduces false HIGH from empty or near-empty frames. |
163
-
164
  Each guardrail is implemented in the corresponding agent file; further justification is documented in `ethics_and_safety_report.txt`.
165
 
166
  ---
@@ -237,7 +230,7 @@ source venv/bin/activate # Windows: venv\Scripts\activate
237
  pip install -r requirements.txt
238
  ```
239
 
240
- Set `GROQ_API_KEY` in the environment or in `backend/config.py`. Set `VIDEO_PATH` to a valid video file path (default: `hajj_real_video.mp4` in the backend directory). Set `MODEL_PATH` if using a different YOLO weight file (default: `yolo11l.pt`).
241
 
242
  ```bash
243
  python api.py
@@ -419,31 +412,19 @@ Rationale: in crowd safety, a false alarm is far less costly than a missed stamp
419
  | System failure during peak crowd | Pipeline crash β†’ operators lose visibility | Fail-safe per-agent isolation; operators trained to treat silence as trigger for manual monitoring |
420
  | Adversarial prompt injection | Malicious input manipulates LLM output | Structured JSON-only output; GR-C1–C5 guardrails validate every field; threat_level constrained to whitelist |
421
  | Disproportionate security response | P0 triggers aggressive enforcement harming pilgrims | Playbook actions are crowd-management only (open gates, PA broadcast, crowd guides) β€” not enforcement; human operator has final authority |
422
-
423
  ### 11.7 Fail-Safe Behavior
424
-
425
  If any agent fails, the pipeline continues with safe defaults:
426
-
427
  - **PerceptionAgent fails** β†’ returns empty FrameResult (count=0)
428
  - **RiskAgent fails** β†’ previous risk level is retained
429
  - **VisionCountAgent fails** β†’ falls back to YOLO count (logged in flags)
430
  - **CoordinatorAgent fails** β†’ P0 decision still issued without GPT plan
431
  - **DB write fails** β†’ logged to console, pipeline continues
432
-
433
  ---
434
-
435
  ## 12. Limitations & Future Work
436
-
437
  - **Single-camera, single-zone:** The pipeline processes one video stream per instance. Real deployment at the Grand Mosque would require multiple cameras and zones. Future work: multi-zone state and one pipeline per camera with aggregation at the API or dashboard layer.
438
-
439
  - **Synthetic-only evaluation:** Quantitative metrics in `evaluation.py` are computed on synthetically generated scenarios with known ground-truth counts. Real aerial footage has occlusion, blur, and lighting conditions not fully captured. Future work: annotate real frames and measure real-world accuracy and recall.
440
-
441
  - **No Hajj-specific fine-tuning:** The YOLO model is pre-trained on general datasets. Pilgrims in white ihram can be under-detected (domain shift). VisionCountAgent and ReflectionAgent mitigate this in part when enabled. Future work: fine-tune a detector on Hajj-annotated data to improve recall (estimated +15%).
442
-
443
  - **Coordinator output quality not automatically measured:** Evaluation covers risk levels, priorities, and guardrail compliance β€” not the appropriateness or clarity of generated Arabic plans and alerts. Future work: human-expert rubric and sample-based evaluation of plan quality.
444
-
445
  - **Production scaling:** The architecture supports running multiple pipeline instances; the dashboard and API would need to be extended for per-zone or per-camera state and approve/reject controls per zone.
446
-
447
  ---
448
-
449
  **HaramGuard β€” Capstone Project Β· Tuwaiq Academy**
 
7
  pinned: false
8
  ---
9
 
 
10
  # HaramGuard β€” Agentic AI Safety System for Hajj Crowd Management
11
 
12
  HaramGuard is a real-time, multi-agent decision-support system that integrates computer vision, risk modeling, reflective bias correction, and LLM-based coordination to assist human operators in preventing crowd crush during Hajj and Umrah.
 
69
 
70
  HaramGuard uses a **deterministic, unidirectional pipeline**:
71
 
72
+ 1. **PerceptionAgent** receives each video frame with its detection data and produces a structured `FrameResult` (person count, density score, bounding boxes, track IDs, annotated frame, 3Γ—3 spatial grid for hotspot detection). Uses a YOLOv8 Head Detection model (CrowdHuman) with BoTSORT tracking.
73
+ 2. **RiskAgent** segments the video into clips (detected via count/density jumps) and computes a dynamic risk score every frame using a sliding K-window (17 frames) of unique track IDs: `density_pct = N_est / 150 Γ— 100`, `risk_score = density_pct / 100`. Level stabilization requires 5 consecutive frames at a new level before confirming. Output is `RiskResult`.
74
+ 3. **ReflectionAgent** observes `RiskResult` and `FrameResult`, detects four bias patterns (chronic LOW, rising trend ignored, count–risk mismatch, over-estimation), and corrects risk level/score when needed. Level-changed tracking is recalculated after correction so OperationsAgent always sees the effective level. All reflections are logged to the database.
75
+ 4. **OperationsAgent** emits a `Decision` only when the effective risk level changes (event-driven, post-reflection). Priority (P0/P1/P2) uses density-aligned thresholds (P0: score > 0.80, P1: score > 0.20). P0 rate-limit resets on each pipeline restart. Decisions are stored in the database.
76
+ 5. **CoordinatorAgent** is invoked for **every** decision (P0, P1, P2) in a background thread so it does not block frame processing. Calls Groq LLM (`openai/gpt-oss-120b`) via a ReAct loop (reason β†’ act β†’ observe, max 3 iterations) with six guardrails (GR-C1–C6). The pipeline fills the decision's actions, justification, and selected_gates from the plan.
 
77
  **Single state:** One `state` dictionary is updated each frame and exposed to the FastAPI server; the React dashboard polls it. All numeric thresholds and caps live in `backend/config.py`.
 
78
  ### 3.2 Data Flow
 
79
  - **Input:** Video frames from a file or camera (path set by `VIDEO_PATH` in config).
80
  - **Output:** Per-frame state (frame_id, person_count, density_score, risk_score, risk_level, trend, latest_decision, coordinator_plan, arabic_alert, reflection_summary, risk_history, decisions_log, etc.) plus persisted records in SQLite: `risk_events`, `op_decisions`, `coordinator_plans`, `reflection_log`.
81
  - **Interfaces:** Backend: FastAPI on port 8000 (configurable via `API_PORT`). Frontend: Vite dev server (e.g. port 5173), configurable via `VITE_API_BASE_URL` for the API base URL.
 
82
  ---
 
83
  ## 4. Agentic System Design (Agents Description)
 
84
  The pipeline runs five agents in order each frame. Each agent is implemented in a single module under `backend/agents/`. Data flows unidirectionally; agents do not call each other directly.
 
85
  ### 4.1 PerceptionAgent (`perception_agent.py`)
86
 
87
+ - **Role:** Receive each video frame alongside its pre-computed detection data and produce a structured `FrameResult`: person count, density score, average spacing, bounding boxes, annotated frame, track IDs, occupation percentage, and a 3Γ—3 spatial grid (grid_counts, grid_max, hotspot_zone) for downstream hotspot detection. Based on Umm Al-Qura University (UQU) Haram crowd research: local clustering in one cell can indicate risk even when global count is moderate.
88
+ - **Model:** YOLOv8 Head Detection trained on the CrowdHuman dataset, paired with BoTSORT tracker β€” specifically chosen for dense crowd scenes where full-body detection fails due to occlusion.
89
+ - **Design pattern:** Tool use β€” head detection model for detection; BoTSORT for multi-object tracking across frames. Optional VisionCountAgent (Claude Vision) available but not used in the default pipeline.
90
+ - **Guardrails:** GR1 β€” person count capped at `MAX_PERSONS`. GR2 β€” density score capped at `MAX_DENSITY` (50.0).
91
+ - **Input:** Video frame (numpy array) + detection metadata. **Output:** `FrameResult` (see `backend/core/models.py`).
92
  ### 4.2 RiskAgent (`risk_agent.py`)
93
 
94
+ - **Role:** Segment the video into clips and compute a dynamic risk score per frame using a sliding K-window of unique track IDs. Outputs a scalar risk score in [0, 1], a discrete risk level (LOW / MEDIUM / HIGH), and a density percentage.
95
+ - **Clip segmentation:** A new clip is detected when `|persons[t] βˆ’ persons[tβˆ’1]| β‰₯ 40` OR `|density_score[t] βˆ’ density_score[tβˆ’1]| β‰₯ 0.4`, sustained for β‰₯ 10 consecutive frames (glitch filter). All state resets at each clip boundary.
96
+ - **Sliding K-window density:** Every frame, the agent unions the unique track IDs seen across the last K=17 frames within the current clip: `N_est = |union(track_ids)|`. Density: `density_pct = min(N_est / 150 Γ— 100, 100)`. Risk score: `risk_score = density_pct / 100`.
97
+ - **Risk levels:** density_pct ≀ 20 β†’ LOW; 21–80 β†’ MEDIUM; > 80 β†’ HIGH.
98
+ - **Level stabilization:** A new level must hold for 5 consecutive frames before it is confirmed and `level_changed` fires. `level_changed` is suppressed for the first K frames of each clip (warmup period) to prevent false triggers before the window is full.
99
+ - **Design pattern:** Clip-aware sliding window. All thresholds in `config.py` (`CLIP_P_JUMP`, `CLIP_D_JUMP`, `CLIP_MIN_LEN`, `CLIP_K_WINDOW`, `RISK_HIGH_THRESHOLD`).
100
  - **Input:** `FrameResult`. **Output:** `RiskResult` (frame_id, risk_score, risk_level, trend, level_changed, window_avg, window_max, density_ema, density_pct).
 
101
  ### 4.3 ReflectionAgent (`reflection_agent.py`)
102
 
103
+ - **Role:** Critique the current risk assessment and correct it when one of four bias patterns is detected: (1) chronic LOW β€” N consecutive LOW frames with average person count above threshold β†’ upgrade to MEDIUM; (2) rising trend ignored β€” trend=rising, risk=LOW, count above threshold β†’ upgrade to MEDIUM; (3) count–risk mismatch β€” high person count but LOW risk β†’ upgrade to MEDIUM or HIGH; (4) over-estimation β€” HIGH risk but person count below threshold (< 10) β†’ downgrade to MEDIUM. Score corrections are aligned to the new risk thresholds (>0.20 MEDIUM, >0.80 HIGH). All reflections are persisted to `reflection_log` by the pipeline.
104
  - **Design pattern:** Reflection (observe β†’ critique β†’ correct β†’ log). History window and thresholds in config (`REFLECTION_BIAS_WINDOW`, `REFLECTION_CROWD_LOW_THRESH`, `REFLECTION_HIGH_CROWD_THRESH`, `REFLECTION_OVER_EST_THRESH`).
105
  - **Input:** `RiskResult`, `FrameResult`. **Output:** Reflection dict; pipeline applies corrections to `RiskResult` before passing to OperationsAgent.
106
 
107
  ### 4.4 OperationsAgent (`operations_agent.py`)
108
+ - **Role:** Map the (possibly reflection-corrected) risk level to an operational priority (P0 / P1 / P2) and emit a `Decision` only when the risk level *changes*. Priority thresholds aligned to density-based risk score: `OPS_P0_SCORE = 0.80` (HIGH), `OPS_P1_SCORE = 0.20` (MEDIUM). P0 emission is rate-limited per zone (cooldown 300 s), but the rate-limit resets with each pipeline restart so previous runs do not block new decisions. The decision's actions and selected_gates are left empty; the pipeline fills them via CoordinatorAgent and then stores the decision in `op_decisions`.
 
109
  - **Design pattern:** Event-driven; no decision when level unchanged.
110
  - **Input:** `RiskResult`, context string (e.g. `Mecca_Main_Area`). **Output:** `Decision` or None.
 
111
  ### 4.5 CoordinatorAgent (`coordinator_agent.py`)
112
 
113
+ - **Role:** For every decision (P0, P1, or P2), produce a structured action plan using the Groq LLM (`openai/gpt-oss-120b`). Plan includes threat_level, executive_summary, selected_gates, immediate_actions, actions_justification, arabic_alert, confidence_score. Implements a ReAct loop (max 3 iterations): reason (build prompt from RiskResult, Decision, frame buffer; optional feedback from failed validation) β†’ act (LLM call, parse JSON) β†’ observe (run guardrails GR-C1–C6); repeat until valid or max iterations. GR-C4 enforces that threat_level matches actual risk_score thresholds (>0.80 β†’ HIGH, >0.20 β†’ MEDIUM). Pipeline fills the decision's actions, justification, and selected_gates from the plan and stores the plan in `coordinator_plans`. LLM runs in a background thread so it does not block frame processing.
114
  - **Design pattern:** ReAct (reason β†’ act β†’ observe) + output guardrails.
115
  - **Input:** `RiskResult`, `Decision`, list of recent `FrameResult`s. **Output:** Plan dict.
116
 
 
141
  | --- | --- | --- | --- |
142
  | **GR1** | PerceptionAgent | Person count capped at MAX_PERSONS (1000 in agent) | Prevents implausibly high counts from YOLO artifacts from propagating to risk and alerts. |
143
  | **GR2** | PerceptionAgent | Density score capped at MAX_DENSITY (50.0) | Keeps density in a bounded range for downstream risk formulas. |
144
+ | **GR3** | RiskAgent | Risk score clamped to [0.0, 1.0] | Ensures threshold comparisons (>0.20 MEDIUM, >0.80 HIGH) remain valid. |
145
+ | **GR3b** | RiskAgent | `level_changed` suppressed during K-window warmup (first 17 frames per clip) | Prevents false P0/P1 triggers before enough track data is available. |
146
+ | **GR4** | OperationsAgent | P0 rate-limited per zone (cooldown 300 s); resets on pipeline restart | Reduces alert fatigue; rate-limit does not carry over across runs. |
147
  | **GR-C1** | CoordinatorAgent | Required JSON fields enforced; missing set to safe defaults | Prevents dashboard or downstream logic from breaking when the LLM omits fields. |
148
  | **GR-C2** | CoordinatorAgent | threat_level whitelist (CRITICAL, HIGH, MEDIUM, LOW) | Avoids invalid or adversarial values that would break UI or logic. |
149
  | **GR-C3** | CoordinatorAgent | confidence_score in [0, 1]; otherwise 0.5 | Normalizes LLM output so confidence is comparable. |
150
+ | **GR-C4** | CoordinatorAgent | Full range enforcement: threat_level overridden to match actual risk_score thresholds (>0.80 HIGH, >0.20 MEDIUM, else LOW) | Prevents LLM from returning HIGH threat during MEDIUM risk or CRITICAL during LOW risk. |
151
  | **GR-C5** | CoordinatorAgent | Arabic alert fallback if empty | Ensures safety-critical Arabic alert is never empty on the dashboard. |
152
  | **GR-C6** | CoordinatorAgent | selected_gates must be non-empty list; otherwise fallback | Ensures operators receive concrete gate recommendations. |
153
  | **RF1** | ReflectionAgent | Chronic LOW bias: N consecutive LOW with avg count above threshold β†’ MEDIUM | Addresses sliding-window lag during rapid escalation. |
154
  | **RF2** | ReflectionAgent | Rising trend ignored: trend=rising, LOW, count above threshold β†’ MEDIUM | Corrects inconsistent state (rising crowd with LOW risk). |
155
  | **RF3** | ReflectionAgent | Count–risk mismatch: high count but LOW risk β†’ upgrade to MEDIUM/HIGH | Corrects mathematically inconsistent states. |
156
  | **RF4** | ReflectionAgent | Over-estimation: HIGH risk but count < threshold (e.g. 15) β†’ MEDIUM | Reduces false HIGH from empty or near-empty frames. |
 
157
  Each guardrail is implemented in the corresponding agent file; further justification is documented in `ethics_and_safety_report.txt`.
158
 
159
  ---
 
230
  pip install -r requirements.txt
231
  ```
232
 
233
+ Set `GROQ_API_KEY` in the environment or in `backend/config.py`. Set `VIDEO_PATH` to a valid video file path (default: `hajj_multi_video_annotated.mp4` in the backend directory). The system uses a YOLOv8 Head Detection model (CrowdHuman) with BoTSORT tracking.
234
 
235
  ```bash
236
  python api.py
 
412
  | System failure during peak crowd | Pipeline crash β†’ operators lose visibility | Fail-safe per-agent isolation; operators trained to treat silence as trigger for manual monitoring |
413
  | Adversarial prompt injection | Malicious input manipulates LLM output | Structured JSON-only output; GR-C1–C5 guardrails validate every field; threat_level constrained to whitelist |
414
  | Disproportionate security response | P0 triggers aggressive enforcement harming pilgrims | Playbook actions are crowd-management only (open gates, PA broadcast, crowd guides) β€” not enforcement; human operator has final authority |
 
415
  ### 11.7 Fail-Safe Behavior
 
416
  If any agent fails, the pipeline continues with safe defaults:
 
417
  - **PerceptionAgent fails** β†’ returns empty FrameResult (count=0)
418
  - **RiskAgent fails** β†’ previous risk level is retained
419
  - **VisionCountAgent fails** β†’ falls back to YOLO count (logged in flags)
420
  - **CoordinatorAgent fails** β†’ P0 decision still issued without GPT plan
421
  - **DB write fails** β†’ logged to console, pipeline continues
 
422
  ---
 
423
  ## 12. Limitations & Future Work
 
424
  - **Single-camera, single-zone:** The pipeline processes one video stream per instance. Real deployment at the Grand Mosque would require multiple cameras and zones. Future work: multi-zone state and one pipeline per camera with aggregation at the API or dashboard layer.
 
425
  - **Synthetic-only evaluation:** Quantitative metrics in `evaluation.py` are computed on synthetically generated scenarios with known ground-truth counts. Real aerial footage has occlusion, blur, and lighting conditions not fully captured. Future work: annotate real frames and measure real-world accuracy and recall.
 
426
  - **No Hajj-specific fine-tuning:** The YOLO model is pre-trained on general datasets. Pilgrims in white ihram can be under-detected (domain shift). VisionCountAgent and ReflectionAgent mitigate this in part when enabled. Future work: fine-tune a detector on Hajj-annotated data to improve recall (estimated +15%).
 
427
  - **Coordinator output quality not automatically measured:** Evaluation covers risk levels, priorities, and guardrail compliance β€” not the appropriateness or clarity of generated Arabic plans and alerts. Future work: human-expert rubric and sample-based evaluation of plan quality.
 
428
  - **Production scaling:** The architecture supports running multiple pipeline instances; the dashboard and API would need to be extended for per-zone or per-camera state and approve/reject controls per zone.
 
429
  ---
 
430
  **HaramGuard β€” Capstone Project Β· Tuwaiq Academy**