Spaces:

AISA-Framework
/

HaramGuard

Running

App Files Files Community

Update README.md

#16

by munals - opened 2 days ago

base: refs/heads/main

←

from: refs/pr/16

Discussion Files changed

+24

-43

Files changed (1) hide show

README.md +24 -43

README.md CHANGED Viewed

@@ -7,7 +7,6 @@ sdk: docker
 pinned: false
 ---
 # HaramGuard — Agentic AI Safety System for Hajj Crowd Management
 HaramGuard is a real-time, multi-agent decision-support system that integrates computer vision, risk modeling, reflective bias correction, and LLM-based coordination to assist human operators in preventing crowd crush during Hajj and Umrah.
@@ -70,54 +69,48 @@ Manual monitoring of many camera feeds is error-prone and does not scale. Operat
 HaramGuard uses a **deterministic, unidirectional pipeline**:
-1. **PerceptionAgent** turns each video frame into a structured `FrameResult` (person count, density, spacing, 3×3 spatial grid for hotspots, annotated frame). It uses a YOLO model (path and size in `backend/config.py`); an optional VisionCountAgent (Claude Vision) is available in code but disabled in the default pipeline.
-2. **RiskAgent** maintains a sliding window of `FrameResult`s and computes a risk score and level (LOW/MEDIUM/HIGH) using four paths: Fruin-style EMA of count, instant high-count floor, pre-emptive rate-of-change, and spatial clustering from the 3×3 grid. Output is `RiskResult`.
-3. **ReflectionAgent** observes `RiskResult` and `FrameResult`, detects four bias patterns (chronic LOW, rising trend ignored, count–risk mismatch, over-estimation), and corrects risk level/score when needed. All reflections are logged to the database.
-4. **OperationsAgent** emits a `Decision` only when the risk level *changes* (event-driven). Priority (P0/P1/P2) is derived from config thresholds aligned with RiskAgent. P0 decisions are rate-limited per zone. Decisions are stored in the database; actions and selected gates are left empty until the Coordinator fills them.
-5. **CoordinatorAgent** is invoked by the pipeline for **every** decision (P0, P1, P2). It calls an LLM (Groq API) to generate a structured plan (threat level, executive summary, selected gates, immediate actions, Arabic alert, confidence). A ReAct loop (reason → act → observe, max 3 iterations) validates output with six guardrails (GR-C1–C6); the pipeline then fills the decision's actions, justification, and selected_gates from the plan and stores the plan in the database.
 **Single state:** One `state` dictionary is updated each frame and exposed to the FastAPI server; the React dashboard polls it. All numeric thresholds and caps live in `backend/config.py`.
 ### 3.2 Data Flow
 - **Input:** Video frames from a file or camera (path set by `VIDEO_PATH` in config).
 - **Output:** Per-frame state (frame_id, person_count, density_score, risk_score, risk_level, trend, latest_decision, coordinator_plan, arabic_alert, reflection_summary, risk_history, decisions_log, etc.) plus persisted records in SQLite: `risk_events`, `op_decisions`, `coordinator_plans`, `reflection_log`.
 - **Interfaces:** Backend: FastAPI on port 8000 (configurable via `API_PORT`). Frontend: Vite dev server (e.g. port 5173), configurable via `VITE_API_BASE_URL` for the API base URL.
 ---
 ## 4. Agentic System Design (Agents Description)
 The pipeline runs five agents in order each frame. Each agent is implemented in a single module under `backend/agents/`. Data flows unidirectionally; agents do not call each other directly.
 ### 4.1 PerceptionAgent (`perception_agent.py`)
-- **Role:** Convert a raw video frame into a `FrameResult`: person count, density score, average spacing, bounding boxes, annotated frame, track IDs, occupation percentage, and a 3×3 spatial grid (grid_counts, grid_max, hotspot_zone) for downstream hotspot detection. Based on Umm Al-Qura University (UQU) Haram crowd research: local clustering in one cell can indicate risk even when global count is moderate.
-- **Design pattern:** Tool use — YOLO for detection and tracking; optional VisionCountAgent (Claude Vision) for an alternative count when an Anthropic key is provided. In the current default pipeline, PerceptionAgent is instantiated without an Anthropic key (YOLO-only).
-- **Guardrails:** GR1 — person count capped at `MAX_PERSONS` (1000 in agent class). GR2 — density score capped at `MAX_DENSITY` (50.0).
-- **Input:** Raw frame (numpy array). **Output:** `FrameResult` (see `backend/core/models.py`).
 ### 4.2 RiskAgent (`risk_agent.py`)
-- **Role:** Maintain a sliding window (14 frames) of recent `FrameResult`s and compute a scalar risk score in [0, 1] and a discrete risk level (LOW / MEDIUM / HIGH), plus trend (rising / stable / falling). Final score is the maximum of four paths: (1) Fruin smooth — EMA of current person count normalized to `RISK_HIGH_COUNT` (50), with spacing and trend weights; (2) instant floor — if current count ≥ HIGH_COUNT, score floor 0.70; (3) pre-emptive ROC — 5-frame growth and EMA thresholds; (4) spatial clustering — if any 3×3 grid cell has ≥ `GRID_CELL_HIGH` persons (from FrameResult), floor 0.70. Score is clamped to [0, 1] (GR3).
-- **Design pattern:** Sliding window + multi-path weighted scoring. Window size, thresholds, and weights are in `config.py`.
 - **Input:** `FrameResult`. **Output:** `RiskResult` (frame_id, risk_score, risk_level, trend, level_changed, window_avg, window_max, density_ema, density_pct).
 ### 4.3 ReflectionAgent (`reflection_agent.py`)
-- **Role:** Critique the current risk assessment and correct it when one of four bias patterns is detected: (1) chronic LOW — N consecutive LOW frames with average person count above threshold → upgrade to MEDIUM; (2) rising trend ignored — trend=rising, risk=LOW, count above threshold → upgrade to MEDIUM; (3) count–risk mismatch — high person count but LOW risk → upgrade to MEDIUM or HIGH; (4) over-estimation — HIGH risk but person count below threshold (e.g. < 15) → downgrade to MEDIUM. All reflections are persisted to `reflection_log` by the pipeline.
 - **Design pattern:** Reflection (observe → critique → correct → log). History window and thresholds in config (`REFLECTION_BIAS_WINDOW`, `REFLECTION_CROWD_LOW_THRESH`, `REFLECTION_HIGH_CROWD_THRESH`, `REFLECTION_OVER_EST_THRESH`).
 - **Input:** `RiskResult`, `FrameResult`. **Output:** Reflection dict; pipeline applies corrections to `RiskResult` before passing to OperationsAgent.
 ### 4.4 OperationsAgent (`operations_agent.py`)
-- **Role:** Map the (possibly reflection-corrected) risk level to an operational priority (P0 / P1 / P2) and emit a `Decision` only when the risk level *changes*. Priority is derived from config (`OPS_P0_SCORE`, `OPS_P1_SCORE`) aligned with RiskAgent thresholds. P0 emission is rate-limited per zone (cooldown 300 s in agent class). The decision's actions and selected_gates are left empty; the pipeline fills them via CoordinatorAgent and then stores the decision in `op_decisions`.
 - **Design pattern:** Event-driven; no decision when level unchanged.
 - **Input:** `RiskResult`, context string (e.g. `Mecca_Main_Area`). **Output:** `Decision` or None.
 ### 4.5 CoordinatorAgent (`coordinator_agent.py`)
-- **Role:** For every decision (P0, P1, or P2), produce a structured action plan using the Groq LLM (model in agent, e.g. `openai/gpt-oss-120b`). Plan includes threat_level, executive_summary, selected_gates, immediate_actions, actions_justification, arabic_alert, confidence_score. Implements a ReAct loop (max 3 iterations): reason (build prompt from RiskResult, Decision, frame buffer; optional feedback from failed validation) → act (LLM call, parse JSON) → observe (run guardrails GR-C1–C6); repeat until valid or max iterations. Pipeline fills the decision's actions, justification, and selected_gates from the plan and stores the plan in `coordinator_plans`.
 - **Design pattern:** ReAct (reason → act → observe) + output guardrails.
 - **Input:** `RiskResult`, `Decision`, list of recent `FrameResult`s. **Output:** Plan dict.
@@ -148,19 +141,19 @@ Guardrails are hard constraints and validations applied in code to keep outputs
 | --- | --- | --- | --- |
 | **GR1** | PerceptionAgent | Person count capped at MAX_PERSONS (1000 in agent) | Prevents implausibly high counts from YOLO artifacts from propagating to risk and alerts. |
 | **GR2** | PerceptionAgent | Density score capped at MAX_DENSITY (50.0) | Keeps density in a bounded range for downstream risk formulas. |
-| **GR3** | RiskAgent | Risk score clamped to [0.0, 1.0] | Ensures threshold comparisons (e.g. 0.35, 0.65) remain valid. |
-| **GR4** | OperationsAgent | P0 rate-limited per zone (cooldown 300 s in agent) | Reduces alert fatigue; risk is still logged; only decision emission is rate-limited. |
 | **GR-C1** | CoordinatorAgent | Required JSON fields enforced; missing set to safe defaults | Prevents dashboard or downstream logic from breaking when the LLM omits fields. |
 | **GR-C2** | CoordinatorAgent | threat_level whitelist (CRITICAL, HIGH, MEDIUM, LOW) | Avoids invalid or adversarial values that would break UI or logic. |
 | **GR-C3** | CoordinatorAgent | confidence_score in [0, 1]; otherwise 0.5 | Normalizes LLM output so confidence is comparable. |
-| **GR-C4** | CoordinatorAgent | Full range enforcement: threat_level overridden to match actual risk_score thresholds (LOW/MEDIUM/HIGH) | Prevents LLM from returning HIGH threat during MEDIUM risk or CRITICAL during LOW risk. |
 | **GR-C5** | CoordinatorAgent | Arabic alert fallback if empty | Ensures safety-critical Arabic alert is never empty on the dashboard. |
 | **GR-C6** | CoordinatorAgent | selected_gates must be non-empty list; otherwise fallback | Ensures operators receive concrete gate recommendations. |
 | **RF1** | ReflectionAgent | Chronic LOW bias: N consecutive LOW with avg count above threshold → MEDIUM | Addresses sliding-window lag during rapid escalation. |
 | **RF2** | ReflectionAgent | Rising trend ignored: trend=rising, LOW, count above threshold → MEDIUM | Corrects inconsistent state (rising crowd with LOW risk). |
 | **RF3** | ReflectionAgent | Count–risk mismatch: high count but LOW risk → upgrade to MEDIUM/HIGH | Corrects mathematically inconsistent states. |
 | **RF4** | ReflectionAgent | Over-estimation: HIGH risk but count < threshold (e.g. 15) → MEDIUM | Reduces false HIGH from empty or near-empty frames. |
 Each guardrail is implemented in the corresponding agent file; further justification is documented in `ethics_and_safety_report.txt`.
 ---
@@ -237,7 +230,7 @@ source venv/bin/activate   # Windows: venv\Scripts\activate
 pip install -r requirements.txt
 ```
-Set `GROQ_API_KEY` in the environment or in `backend/config.py`. Set `VIDEO_PATH` to a valid video file path (default: `hajj_real_video.mp4` in the backend directory). Set `MODEL_PATH` if using a different YOLO weight file (default: `yolo11l.pt`).
 ```bash
 python api.py
@@ -419,31 +412,19 @@ Rationale: in crowd safety, a false alarm is far less costly than a missed stamp
 | System failure during peak crowd | Pipeline crash → operators lose visibility | Fail-safe per-agent isolation; operators trained to treat silence as trigger for manual monitoring |
 | Adversarial prompt injection | Malicious input manipulates LLM output | Structured JSON-only output; GR-C1–C5 guardrails validate every field; threat_level constrained to whitelist |
 | Disproportionate security response | P0 triggers aggressive enforcement harming pilgrims | Playbook actions are crowd-management only (open gates, PA broadcast, crowd guides) — not enforcement; human operator has final authority |
 ### 11.7 Fail-Safe Behavior
 If any agent fails, the pipeline continues with safe defaults:
 - **PerceptionAgent fails** → returns empty FrameResult (count=0)
 - **RiskAgent fails** → previous risk level is retained
 - **VisionCountAgent fails** → falls back to YOLO count (logged in flags)
 - **CoordinatorAgent fails** → P0 decision still issued without GPT plan
 - **DB write fails** → logged to console, pipeline continues
 ---
 ## 12. Limitations & Future Work
 - **Single-camera, single-zone:** The pipeline processes one video stream per instance. Real deployment at the Grand Mosque would require multiple cameras and zones. Future work: multi-zone state and one pipeline per camera with aggregation at the API or dashboard layer.
 - **Synthetic-only evaluation:** Quantitative metrics in `evaluation.py` are computed on synthetically generated scenarios with known ground-truth counts. Real aerial footage has occlusion, blur, and lighting conditions not fully captured. Future work: annotate real frames and measure real-world accuracy and recall.
 - **No Hajj-specific fine-tuning:** The YOLO model is pre-trained on general datasets. Pilgrims in white ihram can be under-detected (domain shift). VisionCountAgent and ReflectionAgent mitigate this in part when enabled. Future work: fine-tune a detector on Hajj-annotated data to improve recall (estimated +15%).
 - **Coordinator output quality not automatically measured:** Evaluation covers risk levels, priorities, and guardrail compliance — not the appropriateness or clarity of generated Arabic plans and alerts. Future work: human-expert rubric and sample-based evaluation of plan quality.
 - **Production scaling:** The architecture supports running multiple pipeline instances; the dashboard and API would need to be extended for per-zone or per-camera state and approve/reject controls per zone.
 ---
 **HaramGuard — Capstone Project · Tuwaiq Academy**

 pinned: false
 ---
 # HaramGuard — Agentic AI Safety System for Hajj Crowd Management
 HaramGuard is a real-time, multi-agent decision-support system that integrates computer vision, risk modeling, reflective bias correction, and LLM-based coordination to assist human operators in preventing crowd crush during Hajj and Umrah.
 HaramGuard uses a **deterministic, unidirectional pipeline**:
+1. **PerceptionAgent** receives each video frame with its detection data and produces a structured `FrameResult` (person count, density score, bounding boxes, track IDs, annotated frame, 3×3 spatial grid for hotspot detection). Uses a YOLOv8 Head Detection model (CrowdHuman) with BoTSORT tracking.
+2. **RiskAgent** segments the video into clips (detected via count/density jumps) and computes a dynamic risk score every frame using a sliding K-window (17 frames) of unique track IDs: `density_pct = N_est / 150 × 100`, `risk_score = density_pct / 100`. Level stabilization requires 5 consecutive frames at a new level before confirming. Output is `RiskResult`.
+3. **ReflectionAgent** observes `RiskResult` and `FrameResult`, detects four bias patterns (chronic LOW, rising trend ignored, count–risk mismatch, over-estimation), and corrects risk level/score when needed. Level-changed tracking is recalculated after correction so OperationsAgent always sees the effective level. All reflections are logged to the database.
+4. **OperationsAgent** emits a `Decision` only when the effective risk level changes (event-driven, post-reflection). Priority (P0/P1/P2) uses density-aligned thresholds (P0: score > 0.80, P1: score > 0.20). P0 rate-limit resets on each pipeline restart. Decisions are stored in the database.
+5. **CoordinatorAgent** is invoked for **every** decision (P0, P1, P2) in a background thread so it does not block frame processing. Calls Groq LLM (`openai/gpt-oss-120b`) via a ReAct loop (reason → act → observe, max 3 iterations) with six guardrails (GR-C1–C6). The pipeline fills the decision's actions, justification, and selected_gates from the plan.
 **Single state:** One `state` dictionary is updated each frame and exposed to the FastAPI server; the React dashboard polls it. All numeric thresholds and caps live in `backend/config.py`.
 ### 3.2 Data Flow
 - **Input:** Video frames from a file or camera (path set by `VIDEO_PATH` in config).
 - **Output:** Per-frame state (frame_id, person_count, density_score, risk_score, risk_level, trend, latest_decision, coordinator_plan, arabic_alert, reflection_summary, risk_history, decisions_log, etc.) plus persisted records in SQLite: `risk_events`, `op_decisions`, `coordinator_plans`, `reflection_log`.
 - **Interfaces:** Backend: FastAPI on port 8000 (configurable via `API_PORT`). Frontend: Vite dev server (e.g. port 5173), configurable via `VITE_API_BASE_URL` for the API base URL.
 ---
 ## 4. Agentic System Design (Agents Description)
 The pipeline runs five agents in order each frame. Each agent is implemented in a single module under `backend/agents/`. Data flows unidirectionally; agents do not call each other directly.
 ### 4.1 PerceptionAgent (`perception_agent.py`)
+- **Role:** Receive each video frame alongside its pre-computed detection data and produce a structured `FrameResult`: person count, density score, average spacing, bounding boxes, annotated frame, track IDs, occupation percentage, and a 3×3 spatial grid (grid_counts, grid_max, hotspot_zone) for downstream hotspot detection. Based on Umm Al-Qura University (UQU) Haram crowd research: local clustering in one cell can indicate risk even when global count is moderate.
+- **Model:** YOLOv8 Head Detection trained on the CrowdHuman dataset, paired with BoTSORT tracker — specifically chosen for dense crowd scenes where full-body detection fails due to occlusion.
+- **Design pattern:** Tool use — head detection model for detection; BoTSORT for multi-object tracking across frames. Optional VisionCountAgent (Claude Vision) available but not used in the default pipeline.
+- **Guardrails:** GR1 — person count capped at `MAX_PERSONS`. GR2 — density score capped at `MAX_DENSITY` (50.0).
+- **Input:** Video frame (numpy array) + detection metadata. **Output:** `FrameResult` (see `backend/core/models.py`).
 ### 4.2 RiskAgent (`risk_agent.py`)
+- **Role:** Segment the video into clips and compute a dynamic risk score per frame using a sliding K-window of unique track IDs. Outputs a scalar risk score in [0, 1], a discrete risk level (LOW / MEDIUM / HIGH), and a density percentage.
+- **Clip segmentation:** A new clip is detected when `|persons[t] − persons[t−1]| ≥ 40` OR `|density_score[t] − density_score[t−1]| ≥ 0.4`, sustained for ≥ 10 consecutive frames (glitch filter). All state resets at each clip boundary.
+- **Sliding K-window density:** Every frame, the agent unions the unique track IDs seen across the last K=17 frames within the current clip: `N_est = |union(track_ids)|`. Density: `density_pct = min(N_est / 150 × 100, 100)`. Risk score: `risk_score = density_pct / 100`.
+- **Risk levels:** density_pct ≤ 20 → LOW; 21–80 → MEDIUM; > 80 → HIGH.
+- **Level stabilization:** A new level must hold for 5 consecutive frames before it is confirmed and `level_changed` fires. `level_changed` is suppressed for the first K frames of each clip (warmup period) to prevent false triggers before the window is full.
+- **Design pattern:** Clip-aware sliding window. All thresholds in `config.py` (`CLIP_P_JUMP`, `CLIP_D_JUMP`, `CLIP_MIN_LEN`, `CLIP_K_WINDOW`, `RISK_HIGH_THRESHOLD`).
 - **Input:** `FrameResult`. **Output:** `RiskResult` (frame_id, risk_score, risk_level, trend, level_changed, window_avg, window_max, density_ema, density_pct).
 ### 4.3 ReflectionAgent (`reflection_agent.py`)
+- **Role:** Critique the current risk assessment and correct it when one of four bias patterns is detected: (1) chronic LOW — N consecutive LOW frames with average person count above threshold → upgrade to MEDIUM; (2) rising trend ignored — trend=rising, risk=LOW, count above threshold → upgrade to MEDIUM; (3) count–risk mismatch — high person count but LOW risk → upgrade to MEDIUM or HIGH; (4) over-estimation — HIGH risk but person count below threshold (< 10) → downgrade to MEDIUM. Score corrections are aligned to the new risk thresholds (>0.20 MEDIUM, >0.80 HIGH). All reflections are persisted to `reflection_log` by the pipeline.
 - **Design pattern:** Reflection (observe → critique → correct → log). History window and thresholds in config (`REFLECTION_BIAS_WINDOW`, `REFLECTION_CROWD_LOW_THRESH`, `REFLECTION_HIGH_CROWD_THRESH`, `REFLECTION_OVER_EST_THRESH`).
 - **Input:** `RiskResult`, `FrameResult`. **Output:** Reflection dict; pipeline applies corrections to `RiskResult` before passing to OperationsAgent.
 ### 4.4 OperationsAgent (`operations_agent.py`)
+- **Role:** Map the (possibly reflection-corrected) risk level to an operational priority (P0 / P1 / P2) and emit a `Decision` only when the risk level *changes*. Priority thresholds aligned to density-based risk score: `OPS_P0_SCORE = 0.80` (HIGH), `OPS_P1_SCORE = 0.20` (MEDIUM). P0 emission is rate-limited per zone (cooldown 300 s), but the rate-limit resets with each pipeline restart so previous runs do not block new decisions. The decision's actions and selected_gates are left empty; the pipeline fills them via CoordinatorAgent and then stores the decision in `op_decisions`.
 - **Design pattern:** Event-driven; no decision when level unchanged.
 - **Input:** `RiskResult`, context string (e.g. `Mecca_Main_Area`). **Output:** `Decision` or None.
 ### 4.5 CoordinatorAgent (`coordinator_agent.py`)
+- **Role:** For every decision (P0, P1, or P2), produce a structured action plan using the Groq LLM (`openai/gpt-oss-120b`). Plan includes threat_level, executive_summary, selected_gates, immediate_actions, actions_justification, arabic_alert, confidence_score. Implements a ReAct loop (max 3 iterations): reason (build prompt from RiskResult, Decision, frame buffer; optional feedback from failed validation) → act (LLM call, parse JSON) → observe (run guardrails GR-C1–C6); repeat until valid or max iterations. GR-C4 enforces that threat_level matches actual risk_score thresholds (>0.80 → HIGH, >0.20 → MEDIUM). Pipeline fills the decision's actions, justification, and selected_gates from the plan and stores the plan in `coordinator_plans`. LLM runs in a background thread so it does not block frame processing.
 - **Design pattern:** ReAct (reason → act → observe) + output guardrails.
 - **Input:** `RiskResult`, `Decision`, list of recent `FrameResult`s. **Output:** Plan dict.
 | --- | --- | --- | --- |
 | **GR1** | PerceptionAgent | Person count capped at MAX_PERSONS (1000 in agent) | Prevents implausibly high counts from YOLO artifacts from propagating to risk and alerts. |
 | **GR2** | PerceptionAgent | Density score capped at MAX_DENSITY (50.0) | Keeps density in a bounded range for downstream risk formulas. |
+| **GR3** | RiskAgent | Risk score clamped to [0.0, 1.0] | Ensures threshold comparisons (>0.20 MEDIUM, >0.80 HIGH) remain valid. |
+| **GR3b** | RiskAgent | `level_changed` suppressed during K-window warmup (first 17 frames per clip) | Prevents false P0/P1 triggers before enough track data is available. |
+| **GR4** | OperationsAgent | P0 rate-limited per zone (cooldown 300 s); resets on pipeline restart | Reduces alert fatigue; rate-limit does not carry over across runs. |
 | **GR-C1** | CoordinatorAgent | Required JSON fields enforced; missing set to safe defaults | Prevents dashboard or downstream logic from breaking when the LLM omits fields. |
 | **GR-C2** | CoordinatorAgent | threat_level whitelist (CRITICAL, HIGH, MEDIUM, LOW) | Avoids invalid or adversarial values that would break UI or logic. |
 | **GR-C3** | CoordinatorAgent | confidence_score in [0, 1]; otherwise 0.5 | Normalizes LLM output so confidence is comparable. |
+| **GR-C4** | CoordinatorAgent | Full range enforcement: threat_level overridden to match actual risk_score thresholds (>0.80 HIGH, >0.20 MEDIUM, else LOW) | Prevents LLM from returning HIGH threat during MEDIUM risk or CRITICAL during LOW risk. |
 | **GR-C5** | CoordinatorAgent | Arabic alert fallback if empty | Ensures safety-critical Arabic alert is never empty on the dashboard. |
 | **GR-C6** | CoordinatorAgent | selected_gates must be non-empty list; otherwise fallback | Ensures operators receive concrete gate recommendations. |
 | **RF1** | ReflectionAgent | Chronic LOW bias: N consecutive LOW with avg count above threshold → MEDIUM | Addresses sliding-window lag during rapid escalation. |
 | **RF2** | ReflectionAgent | Rising trend ignored: trend=rising, LOW, count above threshold → MEDIUM | Corrects inconsistent state (rising crowd with LOW risk). |
 | **RF3** | ReflectionAgent | Count–risk mismatch: high count but LOW risk → upgrade to MEDIUM/HIGH | Corrects mathematically inconsistent states. |
 | **RF4** | ReflectionAgent | Over-estimation: HIGH risk but count < threshold (e.g. 15) → MEDIUM | Reduces false HIGH from empty or near-empty frames. |
 Each guardrail is implemented in the corresponding agent file; further justification is documented in `ethics_and_safety_report.txt`.
 ---
 pip install -r requirements.txt
 ```
+Set `GROQ_API_KEY` in the environment or in `backend/config.py`. Set `VIDEO_PATH` to a valid video file path (default: `hajj_multi_video_annotated.mp4` in the backend directory). The system uses a YOLOv8 Head Detection model (CrowdHuman) with BoTSORT tracking.
 ```bash
 python api.py
 | System failure during peak crowd | Pipeline crash → operators lose visibility | Fail-safe per-agent isolation; operators trained to treat silence as trigger for manual monitoring |
 | Adversarial prompt injection | Malicious input manipulates LLM output | Structured JSON-only output; GR-C1–C5 guardrails validate every field; threat_level constrained to whitelist |
 | Disproportionate security response | P0 triggers aggressive enforcement harming pilgrims | Playbook actions are crowd-management only (open gates, PA broadcast, crowd guides) — not enforcement; human operator has final authority |
 ### 11.7 Fail-Safe Behavior
 If any agent fails, the pipeline continues with safe defaults:
 - **PerceptionAgent fails** → returns empty FrameResult (count=0)
 - **RiskAgent fails** → previous risk level is retained
 - **VisionCountAgent fails** → falls back to YOLO count (logged in flags)
 - **CoordinatorAgent fails** → P0 decision still issued without GPT plan
 - **DB write fails** → logged to console, pipeline continues
 ---
 ## 12. Limitations & Future Work
 - **Single-camera, single-zone:** The pipeline processes one video stream per instance. Real deployment at the Grand Mosque would require multiple cameras and zones. Future work: multi-zone state and one pipeline per camera with aggregation at the API or dashboard layer.
 - **Synthetic-only evaluation:** Quantitative metrics in `evaluation.py` are computed on synthetically generated scenarios with known ground-truth counts. Real aerial footage has occlusion, blur, and lighting conditions not fully captured. Future work: annotate real frames and measure real-world accuracy and recall.
 - **No Hajj-specific fine-tuning:** The YOLO model is pre-trained on general datasets. Pilgrims in white ihram can be under-detected (domain shift). VisionCountAgent and ReflectionAgent mitigate this in part when enabled. Future work: fine-tune a detector on Hajj-annotated data to improve recall (estimated +15%).
 - **Coordinator output quality not automatically measured:** Evaluation covers risk levels, priorities, and guardrail compliance — not the appropriateness or clarity of generated Arabic plans and alerts. Future work: human-expert rubric and sample-based evaluation of plan quality.
 - **Production scaling:** The architecture supports running multiple pipeline instances; the dashboard and API would need to be extended for per-zone or per-camera state and approve/reject controls per zone.
 ---
 **HaramGuard — Capstone Project · Tuwaiq Academy**