HaramGuard

Agentic AI Safety System for Real-Time Hajj Crowd Management

Developed by: Adeem Alotaibi, Reem Alamoudi, Munirah Alsubaie, Nourah Alhumaid Supervised by: Eng. Omer Nacar

2M+

Hajj pilgrims annually

Agents in pipeline

Guardrails implemented

Multi-Agent ReAct Pattern Reflection Pattern YOLOv8 Computer Vision FastAPI

System Architecture

HaramGuard Architecture — AISA Framework

INPUT

Aerial Camera Video Feed

OUTPUT

Arabic Emergency Alert — P0 / P1 / P2

Tool & Environment Layer

Perception Agent

YOLOv8 + BoTSORT
Person Count
Density & Spacing

→

Cognitive Agent Layer

Risk Agent

17-frame Sliding Window
ReleScore (0–1)
LOW / MEDIUM / HIGH

→

Cognitive Agent Layer

Reflection Agent

Bias Detection
Observe → Critique
Corrected Risk

→

Bias
Detected?

No ↓

Level
Changed?

Yes →

Correct + Log

Critique Recorded

Agentic Infrastructure Layer

Operations Agent

Event Classification
P0 / P1 / P2
Arabic Action Output

→

Governance, Ethics & Policy Layer

Coordinator Agent (ReAct)

Governance
Policy Enforcement
Ethical Check (GR-C1..5)

→

SQLite — Audit Trail

All decisions logged

Runs on every frame

Fires only when risk level changes (~90% skipped)

Fires only on P0 alert

Every decision logged to SQLite

Tool & Environment — Perception

Cognitive Agent — Risk + Reflection

Agentic Infrastructure — Operations

Governance, Ethics & Policy — Coordinator

Evaluation Results

Quantified Performance — 4 Synthetic Ground-Truth Scenes

100%

System Accuracy

4 / 4 scenes correct end-to-end

100%

Risk → Priority Alignment

727 test cases — HIGH always triggers P0

13×

Faster

Pipeline Speed vs. Real-Time · 387 fps

Scene-by-Scene Accuracy

Scene	Crowd Size	Expected	Result	Convergence
A — Sparse	5–15 persons	LOW	PASS ✓	frame 1
B — Medium	25–45 persons	MEDIUM	PASS ✓	frame 1
C — Dense	60–90 persons	HIGH	PASS ✓	frame 1
D — Escalating	5–90 persons	HIGH	PASS ✓	frame 30

Component Metrics

Detection Rate

100%

Alignment

100%

Refl. Tests

5/5

System Acc.

100%

False Pos. Rate

0.4%

Ops Skip Rate

90%

Development Process

14 Iterative Improvements

YOLO Model Upgrade

nano → 3–4 detections

YOLOv8 → 31 detections

Count-Based Risk Scoring

Scene C: 8%

Scene C: 100%

Reflection Agent Design

20-frame blind spot

5/5 bias tests ✓

Risk-Priority Alignment

HIGH-P1 (bug)

100% alignment

Modular Architecture

1 notebook file

6 independent modules

SQLite Audit Trail

Console logs only

Full audit history

Evaluation Framework

Manual testing

8 quantified metrics

Condition-Based Risk Factors

Count only

+ Compression / Flow

Coordinator ReAct Pattern

Single LLM call

Self-correcting 3 iters

Weight Recalibration

W_DENSITY=0.35 → 50%

W=0.50 → 100%

Risk Index Direction Fix

17 persons → 82% risk

Current count EMA

Trend Score Bidirectionality

t_score always ≥ 0.4

Decreasing → 0.0

Arabic UI & Decision Log

English labels, lost history

Arabic + cumulative log

Clean Dashboard State

Fake HIGH alert on load

ZERO_STATE on startup

Safety & Ethics

Guardrails — 12 Implemented Across 4 Agents

Every guardrail is implemented in code and justified architecturally. Human-in-the-Loop (HITL) design: all outputs are recommendations — humans decide.

GR1

PerceptionAgent

Person Count Cap (MAX=1000)

Prevents YOLO hallucinations on busy textures from propagating to risk scoring.

GR2

PerceptionAgent

Density Score Cap (MAX=50)

Prevents density formula overflow on small frames; keeps score interpretable.

GR3

RiskAgent

Risk Score Clamp [0.0, 1.0]

Weighted sum could exceed 1.0 due to floating point. Clamp ensures valid thresholds.

GR4

OperationsAgent

P0 Rate Limit (1 per 5 min)

Prevents alert fatigue — operators who see 20 P0/hour begin ignoring them.

GR-C1

CoordinatorAgent

Required JSON Fields Enforced

LLMs occasionally omit fields. Missing arabic_alert or threat_level breaks dashboard.

GR-C2

CoordinatorAgent

threat_level Whitelist

Prevents GPT returning "EXTREME" or "UNKNOWN" that break downstream logic.

GR-C3

CoordinatorAgent

Confidence Score [0,1] Validated

LLMs sometimes return confidence as percentage (85 vs 0.85) — normalized.

GR-C4

CoordinatorAgent

Threat Level ↔ Risk Score Consistency

Full range enforcement: threat_level is overridden to match actual risk_score thresholds (LOW/MEDIUM/HIGH). Prevents LLM from returning HIGH threat during MEDIUM risk.

GR-C5

CoordinatorAgent

Arabic Alert Fallback

Arabic alert is safety-critical. Empty string on dashboard during P0 is unacceptable.

RF1

ReflectionAgent

Chronic LOW Bias Detection

Sliding window lag causes 20+ frames of LOW during escalation. Guardrail prevents missed emergencies.

RF2

ReflectionAgent

Rising Trend + LOW → MEDIUM

Rising crowd with LOW risk is a contradictory state indicating calibration failure.

RF3

ReflectionAgent

Count-Risk Mismatch Correction

80+ persons + LOW = mathematical impossibility. Absolute count override applied.

Total Guardrails

4 Perception & Risk Guardrails

5 Coordinator (LLM) Guardrails

3 Reflection Agent Guardrails

HITL Design: All outputs are recommendations. No action executed without human approval.

Features

What HaramGuard Offers | ماذا يقدم حارس الحرم

Real-Time Crowd Perception

YOLO-powered person detection and tracking on live video feeds — estimates count, density, spacing, and flow velocity every frame.

Risk Scoring & Level Detection

Sliding-window risk model classifies crowd state as LOW / MEDIUM / HIGH with a rising/stable/falling trend — calibrated for Hajj-scale densities.

Human-in-the-Loop Dashboard

React dashboard streams live risk state, proposed actions, and coordinator plans. Operators approve or reject every recommendation — the system never acts autonomously.

Full Audit Trail

Every perception result, risk decision, reflection log, and operator action is persisted to SQLite — supporting post-incident review, governance, and compliance.

Beneficiaries

Stakeholders | الجهات المستفيدة

General Authority for the Care of the Two Holy Mosques

الهيئة العامة للعناية بشؤون الحرمين

المشغّل الرئيسي للنظام — يتلقى خطط التدخل وأوامر نشر الأمن وفتح البوابات مباشرةً من HaramGuard.

Nusuk

نُسك

منصة تنظيم الحج والعمرة — تستخدم بيانات الازدحام لإعادة جدولة دفعات الحجاج قبل وصولهم للمناطق عالية الخطورة.

Ministry of Hajj and Umrah

وزارة الحج والعمرة

صاحبة السياسة العليا لإدارة الحج — تستفيد من التقارير التحليلية لتحسين خطط الإدارة السنوية.

Pilgrims

ضيوف الرحمن

المستفيد النهائي — سلامتهم هي الهدف الجوهري للنظام. أكثر من ٢ مليون حاج سنوياً.