--- license: other license_name: intel-custom license_link: LICENSE library_name: openvino pipeline_tag: object-detection tags: - openvino - intel - yolo - yolo26 - loitering-detection - zone-analytics - tracking - edge-ai - metro - dlstreamer datasets: - detection-datasets/coco language: - en --- # Loitering Detection | Property | Value | |---|---| | **Category** | Object Detection + Tracking + Zone Analytics | | **Source Framework** | PyTorch (Ultralytics) | | **Supported Precisions** | FP32, FP16, INT8 (mixed-precision) | | **Inference Engine** | OpenVINO | | **Hardware** | CPU, GPU, NPU | | **Detected Class** | `person` (COCO class 0) | --- ## Overview Loitering Detection is a Metro Analytics use case that flags people who remain inside a configurable region of interest for longer than a dwell-time threshold. It is built on [YOLO26](https://docs.ultralytics.com/models/yolo26/) for person detection, paired with a multi-object tracker that assigns persistent IDs across frames. A polygon zone defines the area to monitor; for each tracked person whose bounding-box anchor falls inside the zone, the application accumulates dwell time and raises a loitering event when the threshold is exceeded. Typical Metro deployments include: - **Restricted-Area Monitoring** -- raise alerts when a person lingers near tracks, equipment rooms, or after-hours zones. - **Platform Edge Safety** -- detect prolonged presence inside a yellow-line buffer. - **ATM and Ticketing Security** -- identify suspicious dwell at unattended kiosks. - **Crowd-Free Zone Enforcement** -- monitor emergency exits and corridors that must remain clear. Available variants: `yolo26n`, `yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`. Smaller variants (`yolo26n`, `yolo26s`) are recommended for high-FPS edge deployment. --- ## Prerequisites - Python 3.11+ - [Install Intel DLStreamer](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/get_started/install/install_guide_ubuntu.html) Create and activate a Python virtual environment before running the scripts: ```bash python3 -m venv .venv --system-site-packages source .venv/bin/activate ``` > **Note:** The `--system-site-packages` flag is required so the virtual > environment can access the system-installed OpenVINO and DLStreamer Python > packages. --- ## Getting Started ### Download and Quantize Model Run the provided script to download, export to OpenVINO IR, and optionally quantize: ```bash chmod +x export_and_quantize.sh ./export_and_quantize.sh ``` This exports the default **yolo26n** model in **FP16** precision. #### Optional: Select a Different Variant or Precision ```bash ./export_and_quantize.sh yolo26n FP32 # full-precision ./export_and_quantize.sh yolo26n INT8 # quantized ./export_and_quantize.sh yolo26s # larger variant, default FP16 ``` Replace `yolo26n` with any variant (`yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`). The second argument selects the precision (`FP32`, `FP16`, `INT8`); the default is **FP16**. The script performs the following steps: 1. Installs dependencies (`openvino`, `ultralytics`; adds `nncf` for INT8). 2. Downloads the sample surveillance video (`VIRAT_S_000101.mp4`) from the Intel Metro AI Suite project into the current directory. 3. Downloads the PyTorch weights and exports to OpenVINO IR. 4. *(INT8 only)* Quantizes the model using NNCF post-training quantization. Output files: - `yolo26n_openvino_model/` -- FP32 or FP16 OpenVINO IR model directory. - `yolo26n_loitering_int8.xml` / `yolo26n_loitering_int8.bin` -- INT8 quantized model *(only when `INT8` is selected)*. #### Precision / Device Compatibility | Precision | CPU | GPU | NPU | |---|---|---|---| | FP32 | Yes | Yes | No | | FP16 | Yes | Yes | Yes | | INT8 | Yes | Yes | Yes | > **Note:** The INT8 calibration uses frames from the bundled sample video. > For production accuracy, replace it with a representative set of frames from > the target deployment site. ### Defining the Region of Interest The zone is a rectangular ROI expressed as `x_min,y_min,x_max,y_max` in the original input frame coordinates (not the 640x640 model input). DLStreamer's `gvaattachroi` element attaches the ROI to every buffer, and `gvadetect inference-region=1` (`roi-list`) restricts inference to that ROI only -- no Python polygon math required. A typical surveillance-zone configuration on a 1280x720 source might be: ```text roi=400,200,1100,650 # ROI for gvaattachroi (x_min,y_min,x_max,y_max) LOITERING_SECONDS = 5.0 # dwell threshold, in seconds (demo value) ``` > **Note:** The sample uses a 5-second threshold so that loitering events are > triggered quickly on the short demo video. For production deployments, > increase this to 10--30 seconds depending on the site's operational > requirements. Per-person dwell time is measured at the bottom-center of the bounding box (the foot anchor), which most closely approximates the person's ground position. ### DLStreamer Sample - The DLStreamer Python module is not on `sys.path` by default. Export `PYTHONPATH` before running: ```bash source /opt/intel/openvino_2026/setupvars.sh source /opt/intel/dlstreamer/scripts/setup_dls_env.sh export PYTHONPATH=/opt/intel/dlstreamer/python:\ /opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-} ``` **Video-based loitering detection** (requires video for dwell-time tracking): ```python from collections import defaultdict import gi gi.require_version("Gst", "1.0") gi.require_version("GstVideo", "1.0") from gi.repository import Gst from gstgva import VideoFrame Gst.init(None) MODEL_XML = "yolo26n_openvino_model/yolo26n.xml" INPUT_VIDEO = "VIRAT_S_000101.mp4" ROI = "0,200,300,400" # x_min,y_min,x_max,y_max LOITERING_SECONDS = 5.0 pipeline_str = ( f"filesrc location={INPUT_VIDEO} ! decodebin3 ! " f"videoconvert ! " f"gvaattachroi roi={ROI} ! " f"gvadetect inference-region=1 model={MODEL_XML} device=GPU " f"threshold=0.5 ! queue ! " f"gvatrack tracking-type=short-term-imageless ! queue ! " f"gvametaconvert add-empty-results=true ! queue ! " f"gvafpscounter ! " f"gvawatermark ! videoconvert ! video/x-raw,format=I420 ! " f"openh264enc ! h264parse ! " f"mp4mux ! filesink name=sink location=output_dlstreamer.mp4" ) pipeline = Gst.parse_launch(pipeline_str) STALE_TIMEOUT = 2.0 # seconds of absence before clearing dwell state dwell_state: dict[int, float] = defaultdict(float) last_seen: dict[int, float] = {} flagged: set[int] = set() def on_buffer(pad, info): buf = info.get_buffer() caps = pad.get_current_caps() frame = VideoFrame(buf, caps=caps) now = buf.pts / Gst.SECOND if buf.pts != Gst.CLOCK_TIME_NONE else 0.0 seen_ids: set[int] = set() for region in frame.regions(): # gvaattachroi attaches a frame-level ROI region; skip it. if region.label() != "person": continue object_id = region.object_id() if object_id <= 0: continue rect = region.rect() foot_x = int(rect.x + rect.w / 2) foot_y = int(rect.y + rect.h) seen_ids.add(object_id) # gvadetect inference-region=1 already constrains detections to the # gvaattachroi zone, so every tracked person here is "in zone". prev = last_seen.get(object_id, now) dwell_state[object_id] += now - prev last_seen[object_id] = now if ( dwell_state[object_id] >= LOITERING_SECONDS and object_id not in flagged ): flagged.add(object_id) print( f"LOITERING id={object_id} " f"dwell={dwell_state[object_id]:.1f}s " f"anchor=({foot_x},{foot_y})", flush=True, ) # Clean up stale tracks after STALE_TIMEOUT seconds of absence. # Keep flagged entries to prevent duplicate alerts when a person # briefly disappears (occlusion / tracker jitter) and reappears. for stale in list(dwell_state): if stale not in seen_ids: elapsed_since = now - last_seen.get(stale, now) if elapsed_since > STALE_TIMEOUT: dwell_state.pop(stale, None) last_seen.pop(stale, None) return Gst.PadProbeReturn.OK sink = pipeline.get_by_name("sink") sink_pad = sink.get_static_pad("sink") sink_pad.add_probe(Gst.PadProbeType.BUFFER, on_buffer) pipeline.set_state(Gst.State.PLAYING) bus = pipeline.get_bus() bus.timed_pop_filtered( Gst.CLOCK_TIME_NONE, Gst.MessageType.EOS | Gst.MessageType.ERROR, ) pipeline.set_state(Gst.State.NULL) ``` Expected output with the sample video and the zone/threshold above (exact track IDs and anchor coordinates may vary between runs due to tracker non-determinism): ```text LOITERING id=26 dwell=5.0s anchor=(147,341) LOITERING id=27 dwell=5.0s anchor=(122,337) LOITERING id=29 dwell=5.0s anchor=(90,322) ... ``` Approximately 10–12 loitering events are expected over the full video. The annotated video is saved to `output_dlstreamer.mp4` with green bounding boxes and track IDs drawn by `gvawatermark` around every detected person. > **Known warning:** The `openh264enc` element prints > `[OpenH264] this = 0x..., Error:CWelsH264SVCEncoder::EncodeFrame(), cmInitParaError.` > on the first frame. This is a benign initialization message — the output > video is encoded correctly. The warning comes from the OpenH264 library's > internal logging and does not indicate a real error. #### Expected Output ![DLStreamer expected output](expected_output_dlstreamer.gif) **Device targets:** - `device=GPU` -- default in the sample code. - `device=CPU` -- change `device=GPU` to `device=CPU`. - `device=NPU` -- change `device=GPU` to `device=NPU`; use `batch-size=1` and `nireq=4` for best NPU utilization. --- ## License Copyright (C) Intel Corporation. All rights reserved. Licensed under the MIT License. See [LICENSE](LICENSE) for details. ## References - [YOLO26 Documentation](https://docs.ultralytics.com/models/yolo26/) - [OpenVINO YOLO26 Notebook](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/yolov26-optimization/yolov26-object-detection.ipynb) - [Intel DLStreamer Object Tracking](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/elements/gvatrack.html) - [OpenVINO Documentation](https://docs.openvino.ai/) - [NNCF Post-Training Quantization](https://docs.openvino.ai/latest/nncf_ptq_introduction.html) - [COCO Dataset](https://cocodataset.org/)