Loitering Detection
| Property | Value |
|---|---|
| Category | Object Detection + Tracking + Zone Analytics |
| Source Framework | PyTorch (Ultralytics) |
| Supported Precisions | FP32, FP16, INT8 (mixed-precision) |
| Inference Engine | OpenVINO |
| Hardware | CPU, GPU, NPU |
| Detected Class | person (COCO class 0) |
Overview
Loitering Detection is a Metro Analytics use case that flags people who remain inside a configurable region of interest for longer than a dwell-time threshold. It is built on YOLO26 for person detection, paired with a multi-object tracker that assigns persistent IDs across frames. A polygon zone defines the area to monitor; for each tracked person whose bounding-box anchor falls inside the zone, the application accumulates dwell time and raises a loitering event when the threshold is exceeded.
Typical Metro deployments include:
- Restricted-Area Monitoring -- raise alerts when a person lingers near tracks, equipment rooms, or after-hours zones.
- Platform Edge Safety -- detect prolonged presence inside a yellow-line buffer.
- ATM and Ticketing Security -- identify suspicious dwell at unattended kiosks.
- Crowd-Free Zone Enforcement -- monitor emergency exits and corridors that must remain clear.
Available variants: yolo26n, yolo26s, yolo26m, yolo26l, yolo26x.
Smaller variants (yolo26n, yolo26s) are recommended for high-FPS edge deployment.
Prerequisites
- Python 3.11+
- Install Intel DLStreamer
Create and activate a Python virtual environment before running the scripts:
python3 -m venv .venv --system-site-packages
source .venv/bin/activate
Note: The
--system-site-packagesflag is required so the virtual environment can access the system-installed OpenVINO and DLStreamer Python packages.
Getting Started
Download and Quantize Model
Run the provided script to download, export to OpenVINO IR, and optionally quantize:
chmod +x export_and_quantize.sh
./export_and_quantize.sh
This exports the default yolo26n model in FP16 precision.
Optional: Select a Different Variant or Precision
./export_and_quantize.sh yolo26n FP32 # full-precision
./export_and_quantize.sh yolo26n INT8 # quantized
./export_and_quantize.sh yolo26s # larger variant, default FP16
Replace yolo26n with any variant (yolo26s, yolo26m, yolo26l, yolo26x).
The second argument selects the precision (FP32, FP16, INT8); the default is FP16.
The script performs the following steps:
- Installs dependencies (
openvino,ultralytics; addsnncffor INT8). - Downloads the sample surveillance video (
VIRAT_S_000101.mp4) from the Intel Metro AI Suite project into the current directory. - Downloads the PyTorch weights and exports to OpenVINO IR.
- (INT8 only) Quantizes the model using NNCF post-training quantization.
Output files:
yolo26n_openvino_model/-- FP32 or FP16 OpenVINO IR model directory.yolo26n_loitering_int8.xml/yolo26n_loitering_int8.bin-- INT8 quantized model (only whenINT8is selected).
Precision / Device Compatibility
| Precision | CPU | GPU | NPU |
|---|---|---|---|
| FP32 | Yes | Yes | No |
| FP16 | Yes | Yes | Yes |
| INT8 | Yes | Yes | Yes |
Note: The INT8 calibration uses frames from the bundled sample video. For production accuracy, replace it with a representative set of frames from the target deployment site.
Defining the Region of Interest
The zone is a rectangular ROI expressed as x_min,y_min,x_max,y_max in the
original input frame coordinates (not the 640x640 model input).
DLStreamer's gvaattachroi element attaches the ROI to every buffer, and
gvadetect inference-region=1 (roi-list) restricts inference to that ROI
only -- no Python polygon math required.
A typical surveillance-zone configuration on a 1280x720 source might be:
roi=400,200,1100,650 # ROI for gvaattachroi (x_min,y_min,x_max,y_max)
LOITERING_SECONDS = 5.0 # dwell threshold, in seconds (demo value)
Note: The sample uses a 5-second threshold so that loitering events are triggered quickly on the short demo video. For production deployments, increase this to 10--30 seconds depending on the site's operational requirements.
Per-person dwell time is measured at the bottom-center of the bounding box (the foot anchor), which most closely approximates the person's ground position.
DLStreamer Sample
- The DLStreamer Python module is not on
sys.pathby default. ExportPYTHONPATHbefore running:
source /opt/intel/openvino_2026/setupvars.sh
source /opt/intel/dlstreamer/scripts/setup_dls_env.sh
export PYTHONPATH=/opt/intel/dlstreamer/python:\
/opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-}
Video-based loitering detection (requires video for dwell-time tracking):
from collections import defaultdict
import gi
gi.require_version("Gst", "1.0")
gi.require_version("GstVideo", "1.0")
from gi.repository import Gst
from gstgva import VideoFrame
Gst.init(None)
MODEL_XML = "yolo26n_openvino_model/yolo26n.xml"
INPUT_VIDEO = "VIRAT_S_000101.mp4"
ROI = "0,200,300,400" # x_min,y_min,x_max,y_max
LOITERING_SECONDS = 5.0
pipeline_str = (
f"filesrc location={INPUT_VIDEO} ! decodebin3 ! "
f"videoconvert ! "
f"gvaattachroi roi={ROI} ! "
f"gvadetect inference-region=1 model={MODEL_XML} device=GPU "
f"threshold=0.5 ! queue ! "
f"gvatrack tracking-type=short-term-imageless ! queue ! "
f"gvametaconvert add-empty-results=true ! queue ! "
f"gvafpscounter ! "
f"gvawatermark ! videoconvert ! video/x-raw,format=I420 ! "
f"openh264enc ! h264parse ! "
f"mp4mux ! filesink name=sink location=output_dlstreamer.mp4"
)
pipeline = Gst.parse_launch(pipeline_str)
STALE_TIMEOUT = 2.0 # seconds of absence before clearing dwell state
dwell_state: dict[int, float] = defaultdict(float)
last_seen: dict[int, float] = {}
flagged: set[int] = set()
def on_buffer(pad, info):
buf = info.get_buffer()
caps = pad.get_current_caps()
frame = VideoFrame(buf, caps=caps)
now = buf.pts / Gst.SECOND if buf.pts != Gst.CLOCK_TIME_NONE else 0.0
seen_ids: set[int] = set()
for region in frame.regions():
# gvaattachroi attaches a frame-level ROI region; skip it.
if region.label() != "person":
continue
object_id = region.object_id()
if object_id <= 0:
continue
rect = region.rect()
foot_x = int(rect.x + rect.w / 2)
foot_y = int(rect.y + rect.h)
seen_ids.add(object_id)
# gvadetect inference-region=1 already constrains detections to the
# gvaattachroi zone, so every tracked person here is "in zone".
prev = last_seen.get(object_id, now)
dwell_state[object_id] += now - prev
last_seen[object_id] = now
if (
dwell_state[object_id] >= LOITERING_SECONDS
and object_id not in flagged
):
flagged.add(object_id)
print(
f"LOITERING id={object_id} "
f"dwell={dwell_state[object_id]:.1f}s "
f"anchor=({foot_x},{foot_y})",
flush=True,
)
# Clean up stale tracks after STALE_TIMEOUT seconds of absence.
# Keep flagged entries to prevent duplicate alerts when a person
# briefly disappears (occlusion / tracker jitter) and reappears.
for stale in list(dwell_state):
if stale not in seen_ids:
elapsed_since = now - last_seen.get(stale, now)
if elapsed_since > STALE_TIMEOUT:
dwell_state.pop(stale, None)
last_seen.pop(stale, None)
return Gst.PadProbeReturn.OK
sink = pipeline.get_by_name("sink")
sink_pad = sink.get_static_pad("sink")
sink_pad.add_probe(Gst.PadProbeType.BUFFER, on_buffer)
pipeline.set_state(Gst.State.PLAYING)
bus = pipeline.get_bus()
bus.timed_pop_filtered(
Gst.CLOCK_TIME_NONE,
Gst.MessageType.EOS | Gst.MessageType.ERROR,
)
pipeline.set_state(Gst.State.NULL)
Expected output with the sample video and the zone/threshold above (exact track IDs and anchor coordinates may vary between runs due to tracker non-determinism):
LOITERING id=26 dwell=5.0s anchor=(147,341)
LOITERING id=27 dwell=5.0s anchor=(122,337)
LOITERING id=29 dwell=5.0s anchor=(90,322)
...
Approximately 10–12 loitering events are expected over the full video.
The annotated video is saved to output_dlstreamer.mp4 with green bounding boxes and
track IDs drawn by gvawatermark around every detected person.
Known warning: The
openh264encelement prints[OpenH264] this = 0x..., Error:CWelsH264SVCEncoder::EncodeFrame(), cmInitParaError.on the first frame. This is a benign initialization message — the output video is encoded correctly. The warning comes from the OpenH264 library's internal logging and does not indicate a real error.
Expected Output
Device targets:
device=GPU-- default in the sample code.device=CPU-- changedevice=GPUtodevice=CPU.device=NPU-- changedevice=GPUtodevice=NPU; usebatch-size=1andnireq=4for best NPU utilization.
License
Copyright (C) Intel Corporation. All rights reserved. Licensed under the MIT License. See LICENSE for details.
