Intel
/

loitering-detection

@@ -1,6 +1,6 @@
 # Loitering Detection
-> **Validated with:** OpenVINO 2026.0.0, NNCF 3.0.0, DLStreamer 2026.0, Ultralytics 8.3.0, Python 3.11+
 | Property | Value |
 |---|---|
@@ -34,7 +34,7 @@ Smaller variants (`yolo26n`, `yolo26s`) are recommended for high-FPS edge deploy
 ## Prerequisites
 - Python 3.11+
-- [Install OpenVINO 2026.0.0](https://docs.openvino.ai/2026/get-started/install-openvino.html)
 - [Install Intel DLStreamer](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/get_started/install/install_guide_ubuntu.html)
 Create and activate a Python virtual environment before running the scripts:
@@ -54,9 +54,17 @@ Run the provided script to download, export to OpenVINO IR, and optionally quant
 ```bash
 chmod +x export_and_quantize.sh
-./export_and_quantize.sh yolo26n        # default: FP16
 ./export_and_quantize.sh yolo26n FP32   # full-precision
 ./export_and_quantize.sh yolo26n INT8   # quantized
 ```
 Replace `yolo26n` with any variant (`yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`).
@@ -82,9 +90,9 @@ Output files:
 | FP16 | Yes | Yes | Yes |
 | INT8 | Yes | Yes | Yes |
-> **Note:** For production accuracy, replace the random calibration tensors in
-> `export_and_quantize.sh` with a representative sample of frames from the
-> target deployment site.
 ### Defining the Region of Interest
@@ -97,9 +105,14 @@ A typical surveillance-zone configuration on a 1280x720 source might be:
 ```text
 roi=400,200,1100,650          # ROI for gvaattachroi (x_min,y_min,x_max,y_max)
-LOITERING_SECONDS = 5.0       # dwell threshold, in seconds
 ```
 Per-person dwell time is measured at the bottom-center of the bounding box
 (the foot anchor), which most closely approximates the person's ground position.
@@ -142,11 +155,12 @@ pipeline_str = (
     f"gvametaconvert add-empty-results=true ! queue ! "
     f"gvafpscounter ! "
     f"gvawatermark ! videoconvert ! video/x-raw,format=I420 ! "
-    f"openh264enc ! h264parse ! "
     f"mp4mux ! filesink name=sink location=output.mp4"
 )
 pipeline = Gst.parse_launch(pipeline_str)
 dwell_state: dict[int, float] = defaultdict(float)
 last_seen: dict[int, float] = {}
 flagged: set[int] = set()
@@ -191,11 +205,15 @@ def on_buffer(pad, info):
                 flush=True,
             )
     for stale in list(dwell_state):
         if stale not in seen_ids:
-            dwell_state.pop(stale, None)
-            last_seen.pop(stale, None)
-            flagged.discard(stale)
     return Gst.PadProbeReturn.OK
@@ -223,8 +241,6 @@ LOITERING id=9 dwell=1.6s anchor=(527,250)
 The annotated video is saved to `output.mp4` with green bounding boxes and
 track IDs drawn by `gvawatermark` around every detected person.
-Increasing `LOITERING_SECONDS` back to its operational default (around 10 s) suppresses the events on this short walking clip; reproduce a real loitering scenario with a stationary subject in your own footage.
 ---
 ## License

 # Loitering Detection
+> **Validated with:** OpenVINO 2026.1.0, NNCF 3.0.0, DLStreamer 2026.0, Ultralytics 8.4.46, Python 3.11+
 | Property | Value |
 |---|---|
 ## Prerequisites
 - Python 3.11+
+- [Install OpenVINO](https://docs.openvino.ai/2026/get-started/install-openvino.html) (latest version)
 - [Install Intel DLStreamer](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/get_started/install/install_guide_ubuntu.html)
 Create and activate a Python virtual environment before running the scripts:
 ```bash
 chmod +x export_and_quantize.sh
+./export_and_quantize.sh
+```
+This exports the default **yolo26n** model in **FP16** precision.
+#### Optional: Select a Different Variant or Precision
+```bash
 ./export_and_quantize.sh yolo26n FP32   # full-precision
 ./export_and_quantize.sh yolo26n INT8   # quantized
+./export_and_quantize.sh yolo26s        # larger variant, default FP16
 ```
 Replace `yolo26n` with any variant (`yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`).
 | FP16 | Yes | Yes | Yes |
 | INT8 | Yes | Yes | Yes |
+> **Note:** The INT8 calibration uses frames from the bundled sample video.
+> For production accuracy, replace it with a representative set of frames from
+> the target deployment site.
 ### Defining the Region of Interest
 ```text
 roi=400,200,1100,650          # ROI for gvaattachroi (x_min,y_min,x_max,y_max)
+LOITERING_SECONDS = 5.0       # dwell threshold, in seconds (demo value)
 ```
+> **Note:** The sample uses a 5-second threshold so that loitering events are
+> triggered quickly on the short demo video.  For production deployments,
+> increase this to 10--30 seconds depending on the site's operational
+> requirements.
 Per-person dwell time is measured at the bottom-center of the bounding box
 (the foot anchor), which most closely approximates the person's ground position.
     f"gvametaconvert add-empty-results=true ! queue ! "
     f"gvafpscounter ! "
     f"gvawatermark ! videoconvert ! video/x-raw,format=I420 ! "
+    f"x264enc ! h264parse ! "
     f"mp4mux ! filesink name=sink location=output.mp4"
 )
 pipeline = Gst.parse_launch(pipeline_str)
+STALE_TIMEOUT = 2.0  # seconds of absence before clearing dwell state
 dwell_state: dict[int, float] = defaultdict(float)
 last_seen: dict[int, float] = {}
 flagged: set[int] = set()
                 flush=True,
             )
+    # Clean up stale tracks after STALE_TIMEOUT seconds of absence.
+    # Keep flagged entries to prevent duplicate alerts when a person
+    # briefly disappears (occlusion / tracker jitter) and reappears.
     for stale in list(dwell_state):
         if stale not in seen_ids:
+            elapsed_since = now - last_seen.get(stale, now)
+            if elapsed_since > STALE_TIMEOUT:
+                dwell_state.pop(stale, None)
+                last_seen.pop(stale, None)
     return Gst.PadProbeReturn.OK
 The annotated video is saved to `output.mp4` with green bounding boxes and
 track IDs drawn by `gvawatermark` around every detected person.
 ---
 ## License

export_and_quantize.sh CHANGED Viewed

@@ -68,12 +68,27 @@ if [[ "${PRECISION}" == "INT8" ]]; then
 import nncf
 import openvino as ov
 import numpy as np
 core = ov.Core()
 model = core.read_model('${MODEL_NAME}_openvino_model/${MODEL_NAME}.xml')
 def transform_fn(data_item):
-    return np.random.rand(1, 3, 640, 640).astype(np.float32)
 calibration_dataset = nncf.Dataset(list(range(300)), transform_fn)

 import nncf
 import openvino as ov
 import numpy as np
+import cv2
 core = ov.Core()
 model = core.read_model('${MODEL_NAME}_openvino_model/${MODEL_NAME}.xml')
+# Extract frames from the sample video for calibration.
+cap = cv2.VideoCapture('VIRAT_S_000101.mp4')
+frames = []
+while len(frames) < 300:
+    ret, frame = cap.read()
+    if not ret:
+        cap.set(cv2.CAP_PROP_POS_FRAMES, 0)
+        continue
+    img = cv2.resize(frame, (640, 640))
+    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
+    img = img.transpose(2, 0, 1)[np.newaxis, ...]
+    frames.append(img)
+cap.release()
 def transform_fn(data_item):
+    return frames[data_item % len(frames)]
 calibration_dataset = nncf.Dataset(list(range(300)), transform_fn)