| --- |
| license: other |
| license_name: intel-custom |
| license_link: LICENSE |
| library_name: openvino |
| pipeline_tag: object-detection |
| tags: |
| - openvino |
| - intel |
| - yolo |
| - yolo26 |
| - loitering-detection |
| - zone-analytics |
| - tracking |
| - edge-ai |
| - metro |
| - dlstreamer |
| datasets: |
| - detection-datasets/coco |
| language: |
| - en |
| --- |
| |
| # Loitering Detection |
|
|
| | Property | Value | |
| |---|---| |
| | **Category** | Object Detection + Tracking + Zone Analytics | |
| | **Source Framework** | PyTorch (Ultralytics) | |
| | **Supported Precisions** | FP32, FP16, INT8 (mixed-precision) | |
| | **Inference Engine** | OpenVINO | |
| | **Hardware** | CPU, GPU, NPU | |
| | **Detected Class** | `person` (COCO class 0) | |
|
|
| --- |
|
|
| ## Overview |
|
|
| Loitering Detection is a Metro Analytics use case that flags people who remain inside a configurable region of interest for longer than a dwell-time threshold. |
| It is built on [YOLO26](https://docs.ultralytics.com/models/yolo26/) for person detection, paired with a multi-object tracker that assigns persistent IDs across frames. |
| A polygon zone defines the area to monitor; for each tracked person whose bounding-box anchor falls inside the zone, the application accumulates dwell time and raises a loitering event when the threshold is exceeded. |
|
|
| Typical Metro deployments include: |
|
|
| - **Restricted-Area Monitoring** -- raise alerts when a person lingers near tracks, equipment rooms, or after-hours zones. |
| - **Platform Edge Safety** -- detect prolonged presence inside a yellow-line buffer. |
| - **ATM and Ticketing Security** -- identify suspicious dwell at unattended kiosks. |
| - **Crowd-Free Zone Enforcement** -- monitor emergency exits and corridors that must remain clear. |
|
|
| Available variants: `yolo26n`, `yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`. |
| Smaller variants (`yolo26n`, `yolo26s`) are recommended for high-FPS edge deployment. |
|
|
| --- |
|
|
| ## Prerequisites |
|
|
| - Python 3.11+ |
| - [Install Intel DLStreamer](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/get_started/install/install_guide_ubuntu.html) |
|
|
| Create and activate a Python virtual environment before running the scripts: |
|
|
| ```bash |
| python3 -m venv .venv --system-site-packages |
| source .venv/bin/activate |
| ``` |
|
|
| > **Note:** The `--system-site-packages` flag is required so the virtual |
| > environment can access the system-installed OpenVINO and DLStreamer Python |
| > packages. |
|
|
| --- |
|
|
| ## Getting Started |
|
|
| ### Download and Quantize Model |
|
|
| Run the provided script to download, export to OpenVINO IR, and optionally quantize: |
|
|
| ```bash |
| chmod +x export_and_quantize.sh |
| ./export_and_quantize.sh |
| ``` |
|
|
| This exports the default **yolo26n** model in **FP16** precision. |
|
|
| #### Optional: Select a Different Variant or Precision |
|
|
| ```bash |
| ./export_and_quantize.sh yolo26n FP32 # full-precision |
| ./export_and_quantize.sh yolo26n INT8 # quantized |
| ./export_and_quantize.sh yolo26s # larger variant, default FP16 |
| ``` |
|
|
| Replace `yolo26n` with any variant (`yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`). |
| The second argument selects the precision (`FP32`, `FP16`, `INT8`); the default is **FP16**. |
|
|
| The script performs the following steps: |
|
|
| 1. Installs dependencies (`openvino`, `ultralytics`; adds `nncf` for INT8). |
| 2. Downloads the sample surveillance video (`VIRAT_S_000101.mp4`) from the Intel Metro AI Suite project into the current directory. |
| 3. Downloads the PyTorch weights and exports to OpenVINO IR. |
| 4. *(INT8 only)* Quantizes the model using NNCF post-training quantization. |
|
|
| Output files: |
|
|
| - `yolo26n_openvino_model/` -- FP32 or FP16 OpenVINO IR model directory. |
| - `yolo26n_loitering_int8.xml` / `yolo26n_loitering_int8.bin` -- INT8 quantized model *(only when `INT8` is selected)*. |
|
|
| #### Precision / Device Compatibility |
|
|
| | Precision | CPU | GPU | NPU | |
| |---|---|---|---| |
| | FP32 | Yes | Yes | No | |
| | FP16 | Yes | Yes | Yes | |
| | INT8 | Yes | Yes | Yes | |
|
|
| > **Note:** The INT8 calibration uses frames from the bundled sample video. |
| > For production accuracy, replace it with a representative set of frames from |
| > the target deployment site. |
|
|
| ### Defining the Region of Interest |
|
|
| The zone is a rectangular ROI expressed as `x_min,y_min,x_max,y_max` in the |
| original input frame coordinates (not the 640x640 model input). |
| DLStreamer's `gvaattachroi` element attaches the ROI to every buffer, and |
| `gvadetect inference-region=1` (`roi-list`) restricts inference to that ROI |
| only -- no Python polygon math required. |
| A typical surveillance-zone configuration on a 1280x720 source might be: |
|
|
| ```text |
| roi=400,200,1100,650 # ROI for gvaattachroi (x_min,y_min,x_max,y_max) |
| LOITERING_SECONDS = 5.0 # dwell threshold, in seconds (demo value) |
| ``` |
|
|
| > **Note:** The sample uses a 5-second threshold so that loitering events are |
| > triggered quickly on the short demo video. For production deployments, |
| > increase this to 10--30 seconds depending on the site's operational |
| > requirements. |
|
|
| Per-person dwell time is measured at the bottom-center of the bounding box |
| (the foot anchor), which most closely approximates the person's ground position. |
|
|
| ### DLStreamer Sample |
|
|
| - The DLStreamer Python module is not on `sys.path` by default. Export `PYTHONPATH` before running: |
|
|
| ```bash |
| source /opt/intel/openvino_2026/setupvars.sh |
| source /opt/intel/dlstreamer/scripts/setup_dls_env.sh |
| export PYTHONPATH=/opt/intel/dlstreamer/python:\ |
| /opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-} |
| ``` |
|
|
| **Video-based loitering detection** (requires video for dwell-time tracking): |
|
|
| ```python |
| from collections import defaultdict |
| |
| import gi |
| |
| gi.require_version("Gst", "1.0") |
| gi.require_version("GstVideo", "1.0") |
| from gi.repository import Gst |
| from gstgva import VideoFrame |
| |
| Gst.init(None) |
| |
| MODEL_XML = "yolo26n_openvino_model/yolo26n.xml" |
| INPUT_VIDEO = "VIRAT_S_000101.mp4" |
| ROI = "0,200,300,400" # x_min,y_min,x_max,y_max |
| LOITERING_SECONDS = 5.0 |
| |
| pipeline_str = ( |
| f"filesrc location={INPUT_VIDEO} ! decodebin3 ! " |
| f"videoconvert ! " |
| f"gvaattachroi roi={ROI} ! " |
| f"gvadetect inference-region=1 model={MODEL_XML} device=GPU " |
| f"threshold=0.5 ! queue ! " |
| f"gvatrack tracking-type=short-term-imageless ! queue ! " |
| f"gvametaconvert add-empty-results=true ! queue ! " |
| f"gvafpscounter ! " |
| f"gvawatermark ! videoconvert ! video/x-raw,format=I420 ! " |
| f"openh264enc ! h264parse ! " |
| f"mp4mux ! filesink name=sink location=output_dlstreamer.mp4" |
| ) |
| pipeline = Gst.parse_launch(pipeline_str) |
| |
| STALE_TIMEOUT = 2.0 # seconds of absence before clearing dwell state |
| dwell_state: dict[int, float] = defaultdict(float) |
| last_seen: dict[int, float] = {} |
| flagged: set[int] = set() |
| |
| |
| def on_buffer(pad, info): |
| buf = info.get_buffer() |
| caps = pad.get_current_caps() |
| frame = VideoFrame(buf, caps=caps) |
| |
| now = buf.pts / Gst.SECOND if buf.pts != Gst.CLOCK_TIME_NONE else 0.0 |
| seen_ids: set[int] = set() |
| |
| for region in frame.regions(): |
| # gvaattachroi attaches a frame-level ROI region; skip it. |
| if region.label() != "person": |
| continue |
| object_id = region.object_id() |
| if object_id <= 0: |
| continue |
| |
| rect = region.rect() |
| foot_x = int(rect.x + rect.w / 2) |
| foot_y = int(rect.y + rect.h) |
| seen_ids.add(object_id) |
| |
| # gvadetect inference-region=1 already constrains detections to the |
| # gvaattachroi zone, so every tracked person here is "in zone". |
| prev = last_seen.get(object_id, now) |
| dwell_state[object_id] += now - prev |
| last_seen[object_id] = now |
| |
| if ( |
| dwell_state[object_id] >= LOITERING_SECONDS |
| and object_id not in flagged |
| ): |
| flagged.add(object_id) |
| print( |
| f"LOITERING id={object_id} " |
| f"dwell={dwell_state[object_id]:.1f}s " |
| f"anchor=({foot_x},{foot_y})", |
| flush=True, |
| ) |
| |
| # Clean up stale tracks after STALE_TIMEOUT seconds of absence. |
| # Keep flagged entries to prevent duplicate alerts when a person |
| # briefly disappears (occlusion / tracker jitter) and reappears. |
| for stale in list(dwell_state): |
| if stale not in seen_ids: |
| elapsed_since = now - last_seen.get(stale, now) |
| if elapsed_since > STALE_TIMEOUT: |
| dwell_state.pop(stale, None) |
| last_seen.pop(stale, None) |
| |
| return Gst.PadProbeReturn.OK |
| |
| |
| sink = pipeline.get_by_name("sink") |
| sink_pad = sink.get_static_pad("sink") |
| sink_pad.add_probe(Gst.PadProbeType.BUFFER, on_buffer) |
| |
| pipeline.set_state(Gst.State.PLAYING) |
| bus = pipeline.get_bus() |
| bus.timed_pop_filtered( |
| Gst.CLOCK_TIME_NONE, |
| Gst.MessageType.EOS | Gst.MessageType.ERROR, |
| ) |
| pipeline.set_state(Gst.State.NULL) |
| ``` |
|
|
| Expected output with the sample video and the zone/threshold above |
| (exact track IDs and anchor coordinates may vary between runs due to |
| tracker non-determinism): |
|
|
| ```text |
| LOITERING id=26 dwell=5.0s anchor=(147,341) |
| LOITERING id=27 dwell=5.0s anchor=(122,337) |
| LOITERING id=29 dwell=5.0s anchor=(90,322) |
| ... |
| ``` |
|
|
| Approximately 10–12 loitering events are expected over the full video. |
|
|
| The annotated video is saved to `output_dlstreamer.mp4` with green bounding boxes and |
| track IDs drawn by `gvawatermark` around every detected person. |
|
|
| > **Known warning:** The `openh264enc` element prints |
| > `[OpenH264] this = 0x..., Error:CWelsH264SVCEncoder::EncodeFrame(), cmInitParaError.` |
| > on the first frame. This is a benign initialization message — the output |
| > video is encoded correctly. The warning comes from the OpenH264 library's |
| > internal logging and does not indicate a real error. |
|
|
| #### Expected Output |
|
|
|  |
|
|
| **Device targets:** |
|
|
| - `device=GPU` -- default in the sample code. |
| - `device=CPU` -- change `device=GPU` to `device=CPU`. |
| - `device=NPU` -- change `device=GPU` to `device=NPU`; use `batch-size=1` and `nireq=4` for best NPU utilization. |
|
|
| --- |
|
|
| ## License |
|
|
| Copyright (C) Intel Corporation. All rights reserved. |
| Licensed under the MIT License. See [LICENSE](LICENSE) for details. |
|
|
| ## References |
|
|
| - [YOLO26 Documentation](https://docs.ultralytics.com/models/yolo26/) |
| - [OpenVINO YOLO26 Notebook](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/yolov26-optimization/yolov26-object-detection.ipynb) |
| - [Intel DLStreamer Object Tracking](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/elements/gvatrack.html) |
| - [OpenVINO Documentation](https://docs.openvino.ai/) |
| - [NNCF Post-Training Quantization](https://docs.openvino.ai/latest/nncf_ptq_introduction.html) |
| - [COCO Dataset](https://cocodataset.org/) |
|
|