Spaces:
Running
title: HaramGuard
emoji: ๐
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
HaramGuard โ Agentic AI Safety System for Hajj Crowd Management
HaramGuard is a real-time, multi-agent decision-support system that integrates computer vision, risk modeling, reflective bias correction, and LLM-based coordination to assist human operators in preventing crowd crush during Hajj and Umrah.
Designed for deployment at the Grand Mosque (Masjid al-Haram), the system analyzes video feeds, estimates crowd risk levels, and generates structured operational recommendations โ while maintaining strict human-in-the-loop governance, safety guardrails, and full auditability.
Capstone Project ยท Tuwaiq Academy
Developed by: Adeem Alotaibi, Reem Alamoudi, Munirah Alsubaie, Nourah Alhumaid ยท Supervised by: Eng. Omer Nacar
Table of Contents
- Overview
- Problem Definition
- Solution / System Architecture
- Agentic System Design (Agents Description)
- Human-in-the-Loop Design
- Guardrails
- System Architecture Diagram
- Installation & Running
- Repository Structure
- Iterative Improvements
- Ethics & Safety
- Limitations & Future Work
1. Overview
HaramGuard is implemented as a single-pipeline, multi-agent system. One orchestration layer (RealTimePipeline in backend/pipeline.py) runs a fixed sequence of five agents per video frame and maintains a single shared state. The pipeline does not replace the operator; it produces a stream of recommendations and alerts that the operator may approve, reject, or ignore via a React dashboard.
- Backend: Python (FastAPI), Ultralytics YOLO, OpenCV, NumPy, SciPy; SQLite persistence via
HajjFlowDB(backend/core/database.py). Optional Streamlit dashboard entry point indashboard.py. - Frontend: React 18, Vite, Tailwind CSS, Lucide icons; polls the backend REST API for real-time state and displays KPIs, risk gauge, proposed actions, and decisions log with approve/reject/delete controls.
- Data flow: Video frames โ PerceptionAgent โ RiskAgent โ ReflectionAgent โ OperationsAgent โ CoordinatorAgent (when a decision exists). State is updated each frame and persisted (risk events every 30 frames; reflection every frame; decisions and coordinator plans when emitted). The FastAPI server exposes
/api/realtime/state,/api/frames/buffer,/api/actions/{id}/approve,/api/actions/{id}/reject,/api/reset, and/health.
2. Problem Definition
2.1 Real-World Problem
Mass gatherings during Hajj and Umrah create extreme crowd densities in and around the Grand Mosque. Crowd crush and stampede events have occurred in the past, with serious consequences. Effective crowd management depends on timely detection of rising density and flow bottlenecks, and on recommending proportionate interventions (e.g. opening gates, directing flow, broadcasting guidance) before conditions become critical.
Manual monitoring of many camera feeds is error-prone and does not scale. Operators need a system that (1) continuously estimates crowd density and risk from video, (2) explains why a risk level was assigned, and (3) suggests concrete actions while leaving all execution decisions to humans.
2.2 Significance
- Safety: Reducing the likelihood of crowd crush by earlier, evidence-based recommendations.
- Scale: Supporting operators who cannot watch every feed; the system aggregates perception and risk into a single, interpretable state.
- Accountability: Every recommendation is logged with reasoning (risk events, reflection log, decisions, coordinator plans), supporting post-incident review and governance.
- Context: The system is designed to respect the religious and social context of Hajj (no individual identification, human authority over actions, alignment with consultation-based decision-making).
3. Solution / System Architecture
3.1 High-Level Design
HaramGuard uses a deterministic, unidirectional pipeline:
- PerceptionAgent turns each video frame into a structured
FrameResult(person count, density, spacing, 3ร3 spatial grid for hotspots, annotated frame). It uses a YOLO model (path and size inbackend/config.py); an optional VisionCountAgent (Claude Vision) is available in code but disabled in the default pipeline. - RiskAgent maintains a sliding window of
FrameResults and computes a risk score and level (LOW/MEDIUM/HIGH) using four paths: Fruin-style EMA of count, instant high-count floor, pre-emptive rate-of-change, and spatial clustering from the 3ร3 grid. Output isRiskResult. - ReflectionAgent observes
RiskResultandFrameResult, detects four bias patterns (chronic LOW, rising trend ignored, countโrisk mismatch, over-estimation), and corrects risk level/score when needed. All reflections are logged to the database. - OperationsAgent emits a
Decisiononly when the risk level changes (event-driven). Priority (P0/P1/P2) is derived from config thresholds aligned with RiskAgent. P0 decisions are rate-limited per zone. Decisions are stored in the database; actions and selected gates are left empty until the Coordinator fills them. - CoordinatorAgent is invoked by the pipeline for every decision (P0, P1, P2). It calls an LLM (Groq API) to generate a structured plan (threat level, executive summary, selected gates, immediate actions, Arabic alert, confidence). A ReAct loop (reason โ act โ observe, max 3 iterations) validates output with six guardrails (GR-C1โC6); the pipeline then fills the decision's actions, justification, and selected_gates from the plan and stores the plan in the database.
Single state: One state dictionary is updated each frame and exposed to the FastAPI server; the React dashboard polls it. All numeric thresholds and caps live in backend/config.py.
3.2 Data Flow
- Input: Video frames from a file or camera (path set by
VIDEO_PATHin config). - Output: Per-frame state (frame_id, person_count, density_score, risk_score, risk_level, trend, latest_decision, coordinator_plan, arabic_alert, reflection_summary, risk_history, decisions_log, etc.) plus persisted records in SQLite:
risk_events,op_decisions,coordinator_plans,reflection_log. - Interfaces: Backend: FastAPI on port 8000 (configurable via
API_PORT). Frontend: Vite dev server (e.g. port 5173), configurable viaVITE_API_BASE_URLfor the API base URL.
4. Agentic System Design (Agents Description)
The pipeline runs five agents in order each frame. Each agent is implemented in a single module under backend/agents/. Data flows unidirectionally; agents do not call each other directly.
4.1 PerceptionAgent (perception_agent.py)
- Role: Convert a raw video frame into a
FrameResult: person count, density score, average spacing, bounding boxes, annotated frame, track IDs, occupation percentage, and a 3ร3 spatial grid (grid_counts, grid_max, hotspot_zone) for downstream hotspot detection. Based on Umm Al-Qura University (UQU) Haram crowd research: local clustering in one cell can indicate risk even when global count is moderate. - Design pattern: Tool use โ YOLO for detection and tracking; optional VisionCountAgent (Claude Vision) for an alternative count when an Anthropic key is provided. In the current default pipeline, PerceptionAgent is instantiated without an Anthropic key (YOLO-only).
- Guardrails: GR1 โ person count capped at
MAX_PERSONS(1000 in agent class). GR2 โ density score capped atMAX_DENSITY(50.0). - Input: Raw frame (numpy array). Output:
FrameResult(seebackend/core/models.py).
4.2 RiskAgent (risk_agent.py)
- Role: Maintain a sliding window (14 frames) of recent
FrameResults and compute a scalar risk score in [0, 1] and a discrete risk level (LOW / MEDIUM / HIGH), plus trend (rising / stable / falling). Final score is the maximum of four paths: (1) Fruin smooth โ EMA of current person count normalized toRISK_HIGH_COUNT(50), with spacing and trend weights; (2) instant floor โ if current count โฅ HIGH_COUNT, score floor 0.70; (3) pre-emptive ROC โ 5-frame growth and EMA thresholds; (4) spatial clustering โ if any 3ร3 grid cell has โฅGRID_CELL_HIGHpersons (from FrameResult), floor 0.70. Score is clamped to [0, 1] (GR3). - Design pattern: Sliding window + multi-path weighted scoring. Window size, thresholds, and weights are in
config.py. - Input:
FrameResult. Output:RiskResult(frame_id, risk_score, risk_level, trend, level_changed, window_avg, window_max, density_ema, density_pct).
4.3 ReflectionAgent (reflection_agent.py)
- Role: Critique the current risk assessment and correct it when one of four bias patterns is detected: (1) chronic LOW โ N consecutive LOW frames with average person count above threshold โ upgrade to MEDIUM; (2) rising trend ignored โ trend=rising, risk=LOW, count above threshold โ upgrade to MEDIUM; (3) countโrisk mismatch โ high person count but LOW risk โ upgrade to MEDIUM or HIGH; (4) over-estimation โ HIGH risk but person count below threshold (e.g. < 15) โ downgrade to MEDIUM. All reflections are persisted to
reflection_logby the pipeline. - Design pattern: Reflection (observe โ critique โ correct โ log). History window and thresholds in config (
REFLECTION_BIAS_WINDOW,REFLECTION_CROWD_LOW_THRESH,REFLECTION_HIGH_CROWD_THRESH,REFLECTION_OVER_EST_THRESH). - Input:
RiskResult,FrameResult. Output: Reflection dict; pipeline applies corrections toRiskResultbefore passing to OperationsAgent.
4.4 OperationsAgent (operations_agent.py)
- Role: Map the (possibly reflection-corrected) risk level to an operational priority (P0 / P1 / P2) and emit a
Decisiononly when the risk level changes. Priority is derived from config (OPS_P0_SCORE,OPS_P1_SCORE) aligned with RiskAgent thresholds. P0 emission is rate-limited per zone (cooldown 300 s in agent class). The decision's actions and selected_gates are left empty; the pipeline fills them via CoordinatorAgent and then stores the decision inop_decisions. - Design pattern: Event-driven; no decision when level unchanged.
- Input:
RiskResult, context string (e.g.Mecca_Main_Area). Output:Decisionor None.
4.5 CoordinatorAgent (coordinator_agent.py)
- Role: For every decision (P0, P1, or P2), produce a structured action plan using the Groq LLM (model in agent, e.g.
openai/gpt-oss-120b). Plan includes threat_level, executive_summary, selected_gates, immediate_actions, actions_justification, arabic_alert, confidence_score. Implements a ReAct loop (max 3 iterations): reason (build prompt from RiskResult, Decision, frame buffer; optional feedback from failed validation) โ act (LLM call, parse JSON) โ observe (run guardrails GR-C1โC6); repeat until valid or max iterations. Pipeline fills the decision's actions, justification, and selected_gates from the plan and stores the plan incoordinator_plans. - Design pattern: ReAct (reason โ act โ observe) + output guardrails.
- Input:
RiskResult,Decision, list of recentFrameResults. Output: Plan dict.
4.6 VisionCountAgent (vision_count_agent.py) โ Optional
- Role: Provide an alternative person count by sending a subset of frames to a vision API (e.g. Claude Vision). Designed to be called from PerceptionAgent in hybrid mode to improve count in dense or occluded scenes. Not used in the default pipeline (PerceptionAgent is instantiated with
anthropic_key=None). - Design pattern: Tool use; sampling and rate limiting internal to avoid API overload.
5. Human-in-the-Loop Design
HaramGuard is a decision-support system, not an autonomous enforcement system. Every action that affects the physical world (opening gates, dispatching security, broadcasting alerts) is a recommendation to a human operator. No such action is executed by the system alone.
- OperationsAgent emits prioritized decisions (P0/P1/P2) and stores them; CoordinatorAgent produces Arabic alert text and action plans (gates, immediate actions) for each decision. These are shown on the React dashboard as "proposed actions."
- Human operator approves or rejects each proposed action via the dashboard. The API records approve/reject via
/api/actions/{id}/approveand/api/actions/{id}/reject; the system does not execute any action itself. - Operator responsibilities: Treat system silence (e.g. API down or pipeline stopped) as a trigger to switch to manual monitoring; review P0 recommendations and Arabic alerts before any broadcast or deployment; use the dashboard as one input among others (e.g. direct camera views, on-ground reports).
This design aligns with the principle of consultation (Shura) and with due diligence in decisions that affect lives: the machine informs, the human decides.
6. Guardrails
Guardrails are hard constraints and validations applied in code to keep outputs within safe and interpretable bounds. The following are implemented in the current repository.
| ID | Agent | Guardrail | Justification |
|---|---|---|---|
| GR1 | PerceptionAgent | Person count capped at MAX_PERSONS (1000 in agent) | Prevents implausibly high counts from YOLO artifacts from propagating to risk and alerts. |
| GR2 | PerceptionAgent | Density score capped at MAX_DENSITY (50.0) | Keeps density in a bounded range for downstream risk formulas. |
| GR3 | RiskAgent | Risk score clamped to [0.0, 1.0] | Ensures threshold comparisons (e.g. 0.35, 0.65) remain valid. |
| GR4 | OperationsAgent | P0 rate-limited per zone (cooldown 300 s in agent) | Reduces alert fatigue; risk is still logged; only decision emission is rate-limited. |
| GR-C1 | CoordinatorAgent | Required JSON fields enforced; missing set to safe defaults | Prevents dashboard or downstream logic from breaking when the LLM omits fields. |
| GR-C2 | CoordinatorAgent | threat_level whitelist (CRITICAL, HIGH, MEDIUM, LOW) | Avoids invalid or adversarial values that would break UI or logic. |
| GR-C3 | CoordinatorAgent | confidence_score in [0, 1]; otherwise 0.5 | Normalizes LLM output so confidence is comparable. |
| GR-C4 | CoordinatorAgent | Full range enforcement: threat_level overridden to match actual risk_score thresholds (LOW/MEDIUM/HIGH) | Prevents LLM from returning HIGH threat during MEDIUM risk or CRITICAL during LOW risk. |
| GR-C5 | CoordinatorAgent | Arabic alert fallback if empty | Ensures safety-critical Arabic alert is never empty on the dashboard. |
| GR-C6 | CoordinatorAgent | selected_gates must be non-empty list; otherwise fallback | Ensures operators receive concrete gate recommendations. |
| RF1 | ReflectionAgent | Chronic LOW bias: N consecutive LOW with avg count above threshold โ MEDIUM | Addresses sliding-window lag during rapid escalation. |
| RF2 | ReflectionAgent | Rising trend ignored: trend=rising, LOW, count above threshold โ MEDIUM | Corrects inconsistent state (rising crowd with LOW risk). |
| RF3 | ReflectionAgent | Countโrisk mismatch: high count but LOW risk โ upgrade to MEDIUM/HIGH | Corrects mathematically inconsistent states. |
| RF4 | ReflectionAgent | Over-estimation: HIGH risk but count < threshold (e.g. 15) โ MEDIUM | Reduces false HIGH from empty or near-empty frames. |
Each guardrail is implemented in the corresponding agent file; further justification is documented in ethics_and_safety_report.txt.
7. System Architecture Diagram
โโโโโโโโโโโโโโโโโ
โ Video Frame โ
โโโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโ
โPerceptionAgentโ
โโโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโ
โ RiskAgent โ
โโโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโ
โReflectionAgentโ
โโโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโ
โOperationsAgentโ
โโโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโ
โCoordinatorAgentโ
โโโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโ
โ HajjFlowDB โ
โ (SQLite) โ
โโโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโ
โ FastAPI โ
โ REST API โ
โโโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโ
โ React โ
โ Dashboard โ
โ(Human-in-the- โ
โ Loop) โ
โโโโโโโโโโโโโโโโโ
8. Installation & Running
8.1 Prerequisites
- Python 3.9+ (backend)
- Node.js 18+ (frontend)
- Groq API key (required for CoordinatorAgent). Anthropic API key optional (only if enabling VisionCountAgent in the pipeline).
8.2 Backend
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
Set GROQ_API_KEY in the environment or in backend/config.py. Set VIDEO_PATH to a valid video file path (default: hajj_real_video.mp4 in the backend directory). Set MODEL_PATH if using a different YOLO weight file (default: yolo11l.pt).
python api.py
API listens on http://0.0.0.0:8000 by default (port configurable via API_PORT in config).
8.3 Frontend
cd frontend
npm install
npm run dev
Dashboard at http://localhost:5173 (or the port Vite reports). If the API is not at http://localhost:8000, set VITE_API_BASE_URL in frontend/.env or the environment.
8.4 Evaluation
From the backend directory:
python evaluation.py
Outputs are written to backend/outputs/eval/.
9. Repository Structure
Haramguard/
โโโ README.md
โโโ ethics_and_safety_report.txt
โโโ backend/
โ โโโ config.py
โ โโโ requirements.txt
โ โโโ api.py
โ โโโ pipeline.py
โ โโโ evaluation.py
โ โโโ dashboard.py
โ โโโ core/
โ โ โโโ __init__.py
โ โ โโโ models.py
โ โ โโโ database.py
โ โโโ agents/
โ โ โโโ __init__.py
โ โ โโโ perception_agent.py
โ โ โโโ risk_agent.py
โ โ โโโ reflection_agent.py
โ โ โโโ operations_agent.py
โ โ โโโ coordinator_agent.py
โ โ โโโ vision_count_agent.py
โ โโโ outputs/
โ โโโ eval/
โ โโโ summary.json
โ โโโ full_results.json
โโโ frontend/
โโโ package.json
โโโ package-lock.json
โโโ index.html
โโโ vite.config.js
โโโ tailwind.config.js
โโโ postcss.config.js
โโโ .env
โโโ STATE_REFERENCE.md
โโโ src/
โ โโโ main.jsx
โ โโโ App.jsx
โ โโโ index.css
โ โโโ pages/
โ โ โโโ Dashboard.jsx
โ โโโ Fin.svg
โโโ dist/
โโโ index.html
โโโ assets/
10. Iterative Improvements
HaramGuard was developed through 14 documented iterations, each addressing a measured problem with a verifiable before/after result. The first 10 iterations are documented in ITERATIVE_IMPROVEMENT2.md; the following 4 are documented in changes.md.
Summary Table
| # | Title | Problem | Before | After |
|---|---|---|---|---|
| 1 | YOLO Model Upgrade | nano model detected 3โ4 persons on 30+ person frames | ~10% recall | |
| 2 | Count-Based Risk Scoring | Density-based formula: HIGH risk mathematically unreachable on aerial cameras | Scene C accuracy: 0% | Scene C accuracy: 100% |
| 3 | ReflectionAgent Added | 30-frame sliding window caused 20+ frame blind spot during rapid escalation | Uncorrected bias | 5/5 bias detectors passing |
| 4 | RiskโPriority Threshold Alignment | HIGH risk (score 0.66) incorrectly received P1 instead of P0 | RiskโPriority alignment: ~75% | 100% alignment |
| 5 | Hybrid PerceptionAgent (YOLO + Claude Vision) | Dense crowd under-count due to white ihram occlusion | 3โ4 persons detected | Matches ground truth |
| 6 | Modular Architecture | Entire system in one notebook โ untestable, unconfigurable | 0 isolated tests | 6 independent agent modules |
| 7 | SQLite Audit Trail | Reflection corrections lost after session โ no auditability | Console logs only | Full SQLite history |
| 8 | Evaluation Framework | No systematic metrics โ manual testing only | Manual testing | 8 quantified metrics, 4 scenarios |
| 9 | Condition-Based Risk Factors | High compression + clustering still reported LOW risk | Compression undetected | Compression/clustering detected |
| 10 | Weight Recalibration | Condition factors weakened the primary count signal | System accuracy: 50% (2/4) | System accuracy: 75% (3/4) |
| 11 | Risk Index Direction Fix | 17 persons + shrinking crowd โ 82% risk (peak window bug) | window_peak = max(counts) | current_count EMA โ risk falls with crowd |
| 12 | Trend Score Bidirectionality | t_score always โฅ 0.4 even during rapid crowd decrease | Decreasing crowd added constant risk | t_score = 0.0 when crowd shrinking fast |
| 13 | Arabic UI & Decision Log | English-only labels; decisions replaced on each poll | English labels, lost history | Arabic labels (ู ูุฎูุถ/ู ุชูุณุท/ุนุงูู), cumulative log |
| 14 | Clean Dashboard State | FALLBACK_STATE showed fake HIGH emergency on load | Fake P0 alert on startup | ZERO_STATE โ clean until real data arrives |
Key Iterations in Detail
Iteration 2 โ Count-Based Scoring (most critical architectural fix)
The original formula computed pixel density: persons / (frame_pixels / 10,000). For a 1920ร1080 aerial frame (~2M pixels), even 100 persons yields density = 0.5, far below the HIGH threshold of 20. The system was architecturally incapable of ever reporting HIGH risk. Replacing the primary signal with absolute person count normalized to a Hajj-calibrated threshold of 50 persons brought Scene C accuracy from 0% to 100%.
Iteration 4 โ Threshold Alignment (critical safety fix)
RiskAgent labeled scores โฅ 0.65 as HIGH, but OperationsAgent only issued P0 for scores โฅ 0.70. A score of 0.66 โ a genuine HIGH emergency โ would receive P1 (routine monitoring) instead of P0 (immediate response). Aligning both thresholds to 0.65/0.35 in config.py fixed this safety gap and brought RiskโPriority alignment to 100%.
Iteration 11 โ Risk Index Direction Fix
The EMA was computed using max(counts) over the last window, causing the risk to remain inflated long after the crowd had dispersed. Example: 70 persons 15 frames ago but only 17 now โ 82% risk. Switching to current_count as the EMA input allows risk to decrease in proportion to the actual crowd, while the EMA still smooths out frame-to-frame noise.
Iteration 14 โ Clean Dashboard State
FALLBACK_STATE was a hardcoded demo object showing a fake P0 HIGH emergency, designed for UI screenshots. It was left in production code and flashed on screen before the backend connected โ showing operators a false emergency every time the dashboard loaded. Replacing it with ZERO_STATE (all zeros, LOW level, empty arrays) ensures the dashboard starts clean.
11. Ethics & Safety
11.1 Human-in-the-Loop Design
HaramGuard is a decision-support system, not an autonomous enforcement system. Every output is a recommendation to a human operator โ no gate opens, no security is dispatched, no PA broadcast is made without a human approving the action. This design aligns with the Islamic principle of consultation (Shura) and with due diligence in decisions that affect the lives of millions of pilgrims.
11.2 Privacy & Surveillance
HaramGuard processes crowd count data only โ not individual identities.
- No facial recognition is performed
- No biometric data is stored
- YOLO detects person bounding boxes (anonymous blobs only)
- Claude Vision counts persons without identification
- SQLite stores: risk scores, counts, timestamps โ no personal data
None of the database tables (risk_events, op_decisions, coordinator_plans, reflection_log) contain personally identifiable information (PII). Bounding box data is discarded after spacing calculation and tracking IDs are not persisted.
11.3 Fairness & Bias
YOLO models trained on general datasets may under-detect pilgrims in white ihram clothing (domain shift). Two mitigations are implemented:
- VisionCountAgent (Claude Vision) provides a context-aware second counting layer that understands "pilgrims in white garments"
- ReflectionAgent detects chronic under-counting (CHRONIC_LOW_BIAS) and corrects upward, preventing model bias from causing under-response
Residual risk: under-count in extreme occlusion remains possible and is documented in evaluation.py Section 6 as a known limitation.
11.4 Transparency & Explainability
Every decision in HaramGuard is logged with human-readable reasoning:
- RiskAgent: logs risk_score, trend, window_avg
- ReflectionAgent: logs critique text explaining why bias was detected (e.g.
"RISING_TREND_IGNORED: trend=rising, persons=25, but risk=LOW. Upgraded to MEDIUM.") - OperationsAgent: logs which playbook actions were triggered and why
- CoordinatorAgent: logs GPT confidence score and any guardrail corrections applied
Operators can audit every decision post-incident by querying the SQLite database.
11.5 Conservative Bias by Design
The system is deliberately tuned to err toward higher risk:
HIGH_COUNT = 50(not 100) โ triggers HIGH alert at moderate crowd sizes- ReflectionAgent corrections always go upward โ never downgrade an assessment
- Missing confidence score defaults to 0.5 (not 0)
Rationale: in crowd safety, a false alarm is far less costly than a missed stampede.
11.6 Potential Misuse Scenarios
| Scenario | Risk | Mitigation |
|---|---|---|
| Surveillance creep | System extended to track individuals | Bounding boxes discarded after use; no tracking IDs in DB; Vision prompt states "count only โ do not identify" |
| False positive causing panic | Incorrect HIGH alert triggers overreaction | HITL design; ReflectionAgent monitors for oscillation; 30-frame window smooths spikes; P0 rate limiting |
| System failure during peak crowd | Pipeline crash โ operators lose visibility | Fail-safe per-agent isolation; operators trained to treat silence as trigger for manual monitoring |
| Adversarial prompt injection | Malicious input manipulates LLM output | Structured JSON-only output; GR-C1โC5 guardrails validate every field; threat_level constrained to whitelist |
| Disproportionate security response | P0 triggers aggressive enforcement harming pilgrims | Playbook actions are crowd-management only (open gates, PA broadcast, crowd guides) โ not enforcement; human operator has final authority |
11.7 Fail-Safe Behavior
If any agent fails, the pipeline continues with safe defaults:
- PerceptionAgent fails โ returns empty FrameResult (count=0)
- RiskAgent fails โ previous risk level is retained
- VisionCountAgent fails โ falls back to YOLO count (logged in flags)
- CoordinatorAgent fails โ P0 decision still issued without GPT plan
- DB write fails โ logged to console, pipeline continues
12. Limitations & Future Work
Single-camera, single-zone: The pipeline processes one video stream per instance. Real deployment at the Grand Mosque would require multiple cameras and zones. Future work: multi-zone state and one pipeline per camera with aggregation at the API or dashboard layer.
Synthetic-only evaluation: Quantitative metrics in
evaluation.pyare computed on synthetically generated scenarios with known ground-truth counts. Real aerial footage has occlusion, blur, and lighting conditions not fully captured. Future work: annotate real frames and measure real-world accuracy and recall.No Hajj-specific fine-tuning: The YOLO model is pre-trained on general datasets. Pilgrims in white ihram can be under-detected (domain shift). VisionCountAgent and ReflectionAgent mitigate this in part when enabled. Future work: fine-tune a detector on Hajj-annotated data to improve recall (estimated +15%).
Coordinator output quality not automatically measured: Evaluation covers risk levels, priorities, and guardrail compliance โ not the appropriateness or clarity of generated Arabic plans and alerts. Future work: human-expert rubric and sample-based evaluation of plan quality.
Production scaling: The architecture supports running multiple pipeline instances; the dashboard and API would need to be extended for per-zone or per-camera state and approve/reject controls per zone.
HaramGuard โ Capstone Project ยท Tuwaiq Academy