Loitering Detection

Property	Value
Category	Object Detection + Tracking + Zone Analytics
Source Framework	PyTorch (Ultralytics)
Supported Precisions	FP32, FP16, INT8 (mixed-precision)
Inference Engine	OpenVINO
Hardware	CPU, GPU, NPU
Detected Class	`person` (COCO class 0)

Overview

Loitering Detection is a Metro Analytics use case that flags people who remain inside a configurable region of interest for longer than a dwell-time threshold. It is built on YOLO26 for person detection, paired with a multi-object tracker that assigns persistent IDs across frames. A polygon zone defines the area to monitor; for each tracked person whose bounding-box anchor falls inside the zone, the application accumulates dwell time and raises a loitering event when the threshold is exceeded.

Typical Metro deployments include:

Restricted-Area Monitoring -- raise alerts when a person lingers near tracks, equipment rooms, or after-hours zones.
Platform Edge Safety -- detect prolonged presence inside a yellow-line buffer.
ATM and Ticketing Security -- identify suspicious dwell at unattended kiosks.
Crowd-Free Zone Enforcement -- monitor emergency exits and corridors that must remain clear.

Available variants: yolo26n, yolo26s, yolo26m, yolo26l, yolo26x. Smaller variants (yolo26n, yolo26s) are recommended for high-FPS edge deployment.

Prerequisites

Python 3.11+
Install Intel DLStreamer

Create and activate a Python virtual environment before running the scripts:

python3 -m venv .venv --system-site-packages
source .venv/bin/activate

Note: The --system-site-packages flag is required so the virtual environment can access the system-installed OpenVINO and DLStreamer Python packages.

Getting Started

Download and Quantize Model

Run the provided script to download, export to OpenVINO IR, and optionally quantize:

chmod +x export_and_quantize.sh
./export_and_quantize.sh

This exports the default yolo26n model in FP16 precision.

Optional: Select a Different Variant or Precision

./export_and_quantize.sh yolo26n FP32   # full-precision
./export_and_quantize.sh yolo26n INT8   # quantized
./export_and_quantize.sh yolo26s        # larger variant, default FP16

Replace yolo26n with any variant (yolo26s, yolo26m, yolo26l, yolo26x). The second argument selects the precision (FP32, FP16, INT8); the default is FP16.

The script performs the following steps:

Installs dependencies (openvino, ultralytics; adds nncf for INT8).
Downloads the sample surveillance video (VIRAT_S_000101.mp4) from the Intel Metro AI Suite project into the current directory.
Downloads the PyTorch weights and exports to OpenVINO IR.
(INT8 only) Quantizes the model using NNCF post-training quantization.

Output files:

yolo26n_openvino_model/ -- FP32 or FP16 OpenVINO IR model directory.
yolo26n_loitering_int8.xml / yolo26n_loitering_int8.bin -- INT8 quantized model (only when INT8 is selected).

Precision / Device Compatibility

Precision	CPU	GPU	NPU
FP32	Yes	Yes	No
FP16	Yes	Yes	Yes
INT8	Yes	Yes	Yes

Note: The INT8 calibration uses frames from the bundled sample video. For production accuracy, replace it with a representative set of frames from the target deployment site.

Defining the Region of Interest

The zone is a rectangular ROI expressed as x_min,y_min,x_max,y_max in the original input frame coordinates (not the 640x640 model input). DLStreamer's gvaattachroi element attaches the ROI to every buffer, and gvadetect inference-region=1 (roi-list) restricts inference to that ROI only -- no Python polygon math required. A typical surveillance-zone configuration on a 1280x720 source might be:

roi=400,200,1100,650          # ROI for gvaattachroi (x_min,y_min,x_max,y_max)
LOITERING_SECONDS = 5.0       # dwell threshold, in seconds (demo value)

Note: The sample uses a 5-second threshold so that loitering events are triggered quickly on the short demo video. For production deployments, increase this to 10--30 seconds depending on the site's operational requirements.

Per-person dwell time is measured at the bottom-center of the bounding box (the foot anchor), which most closely approximates the person's ground position.

DLStreamer Sample

The DLStreamer Python module is not on sys.path by default. Export PYTHONPATH before running:

source /opt/intel/openvino_2026/setupvars.sh
source /opt/intel/dlstreamer/scripts/setup_dls_env.sh
export PYTHONPATH=/opt/intel/dlstreamer/python:\
/opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-}

Video-based loitering detection (requires video for dwell-time tracking):

from collections import defaultdict

import gi

gi.require_version("Gst", "1.0")
gi.require_version("GstVideo", "1.0")
from gi.repository import Gst
from gstgva import VideoFrame

Gst.init(None)

MODEL_XML = "yolo26n_openvino_model/yolo26n.xml"
INPUT_VIDEO = "VIRAT_S_000101.mp4"
ROI = "0,200,300,400"  # x_min,y_min,x_max,y_max
LOITERING_SECONDS = 5.0

pipeline_str = (
    f"filesrc location={INPUT_VIDEO} ! decodebin3 ! "
    f"videoconvert ! "
    f"gvaattachroi roi={ROI} ! "
    f"gvadetect inference-region=1 model={MODEL_XML} device=GPU "
    f"threshold=0.5 ! queue ! "
    f"gvatrack tracking-type=short-term-imageless ! queue ! "
    f"gvametaconvert add-empty-results=true ! queue ! "
    f"gvafpscounter ! "
    f"gvawatermark ! videoconvert ! video/x-raw,format=I420 ! "
    f"openh264enc ! h264parse ! "
    f"mp4mux ! filesink name=sink location=output_dlstreamer.mp4"
)
pipeline = Gst.parse_launch(pipeline_str)

STALE_TIMEOUT = 2.0  # seconds of absence before clearing dwell state
dwell_state: dict[int, float] = defaultdict(float)
last_seen: dict[int, float] = {}
flagged: set[int] = set()


def on_buffer(pad, info):
    buf = info.get_buffer()
    caps = pad.get_current_caps()
    frame = VideoFrame(buf, caps=caps)

    now = buf.pts / Gst.SECOND if buf.pts != Gst.CLOCK_TIME_NONE else 0.0
    seen_ids: set[int] = set()

    for region in frame.regions():
        # gvaattachroi attaches a frame-level ROI region; skip it.
        if region.label() != "person":
            continue
        object_id = region.object_id()
        if object_id <= 0:
            continue

        rect = region.rect()
        foot_x = int(rect.x + rect.w / 2)
        foot_y = int(rect.y + rect.h)
        seen_ids.add(object_id)

        # gvadetect inference-region=1 already constrains detections to the
        # gvaattachroi zone, so every tracked person here is "in zone".
        prev = last_seen.get(object_id, now)
        dwell_state[object_id] += now - prev
        last_seen[object_id] = now

        if (
            dwell_state[object_id] >= LOITERING_SECONDS
            and object_id not in flagged
        ):
            flagged.add(object_id)
            print(
                f"LOITERING id={object_id} "
                f"dwell={dwell_state[object_id]:.1f}s "
                f"anchor=({foot_x},{foot_y})",
                flush=True,
            )

    # Clean up stale tracks after STALE_TIMEOUT seconds of absence.
    # Keep flagged entries to prevent duplicate alerts when a person
    # briefly disappears (occlusion / tracker jitter) and reappears.
    for stale in list(dwell_state):
        if stale not in seen_ids:
            elapsed_since = now - last_seen.get(stale, now)
            if elapsed_since > STALE_TIMEOUT:
                dwell_state.pop(stale, None)
                last_seen.pop(stale, None)

    return Gst.PadProbeReturn.OK


sink = pipeline.get_by_name("sink")
sink_pad = sink.get_static_pad("sink")
sink_pad.add_probe(Gst.PadProbeType.BUFFER, on_buffer)

pipeline.set_state(Gst.State.PLAYING)
bus = pipeline.get_bus()
bus.timed_pop_filtered(
    Gst.CLOCK_TIME_NONE,
    Gst.MessageType.EOS | Gst.MessageType.ERROR,
)
pipeline.set_state(Gst.State.NULL)

Expected output with the sample video and the zone/threshold above (exact track IDs and anchor coordinates may vary between runs due to tracker non-determinism):

LOITERING id=26 dwell=5.0s anchor=(147,341)
LOITERING id=27 dwell=5.0s anchor=(122,337)
LOITERING id=29 dwell=5.0s anchor=(90,322)
...

Approximately 10–12 loitering events are expected over the full video.

The annotated video is saved to output_dlstreamer.mp4 with green bounding boxes and track IDs drawn by gvawatermark around every detected person.

Known warning: The openh264enc element prints [OpenH264] this = 0x..., Error:CWelsH264SVCEncoder::EncodeFrame(), cmInitParaError. on the first frame. This is a benign initialization message — the output video is encoded correctly. The warning comes from the OpenH264 library's internal logging and does not indicate a real error.

Expected Output

Device targets:

device=GPU -- default in the sample code.
device=CPU -- change device=GPU to device=CPU.
device=NPU -- change device=GPU to device=NPU; use batch-size=1 and nireq=4 for best NPU utilization.

License

References

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train Intel/loitering-detection

Collection including Intel/loitering-detection

Metro Analytics Catalog

Collection

Metro Analytics Catalog is a curated collection of Edge AI models that are supported by Open Edge Platform. • 8 items • Updated about 11 hours ago