Intel
/

loitering-detection

+This directory contains two categories of content under different licenses.
+Scripts and Documentation
+-------------------------
+The scripts (export_and_quantize.sh) and documentation (README.md) in this
+directory are original works by Intel Corporation, licensed under the
+MIT License.
+    Copyright (C) Intel Corporation
+    Permission is hereby granted, free of charge, to any person obtaining a copy
+    of this software and associated documentation files (the "Software"), to deal
+    in the Software without restriction, including without limitation the rights
+    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+    copies of the Software, and to permit persons to whom the Software is
+    furnished to do so, subject to the following conditions:
+    The above copyright notice and this permission notice shall be included in
+    all copies or substantial portions of the Software.
+    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+    THE SOFTWARE.
+YOLO11 Model
+------------
+The YOLO11 model weights and the Ultralytics framework are developed by
+Ultralytics and licensed under the GNU Affero General Public License v3.0
+(AGPL-3.0).
+    Source:  https://github.com/ultralytics/ultralytics
+    License: https://github.com/ultralytics/ultralytics/blob/main/LICENSE
+    Docs:    https://docs.ultralytics.com/models/yolo11/
+Users must comply with the AGPL-3.0 license terms when using, modifying,
+or distributing the YOLO11 model weights or Ultralytics software.
+For commercial licensing options, see https://www.ultralytics.com/license.

README.md CHANGED Viewed

@@ -1,5 +1,294 @@
----
-license: other
-license_name: other
-license_link: LICENSE
----

+# Loitering Detection -- Zone-Based Dwell Time on Intel Hardware
+> **Reference notebook:** [yolov11-object-detection.ipynb](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/yolov11-optimization/yolov11-object-detection.ipynb)
+>
+> **Validated with:** OpenVINO 2026.0.0, NNCF 3.0.0, Ultralytics 8.3.0, Python 3.11+
+| Property | Value |
+|---|---|
+| **Category** | Object Detection + Tracking + Zone Analytics |
+| **Source Framework** | PyTorch (Ultralytics) |
+| **Supported Precisions** | FP16, FP16-INT8 |
+| **Inference Engine** | OpenVINO |
+| **Hardware** | CPU, GPU, NPU |
+| **Detected Class** | `person` (COCO class 0) |
+---
+## Overview
+Loitering Detection is a Metro Analytics use case that flags people who remain inside a configurable region of interest for longer than a dwell-time threshold.
+It is built on [YOLO11](https://docs.ultralytics.com/models/yolo11/) for person detection, paired with a multi-object tracker that assigns persistent IDs across frames.
+A polygon zone defines the area to monitor; for each tracked person whose bounding-box anchor falls inside the zone, the application accumulates dwell time and raises a loitering event when the threshold is exceeded.
+Typical Metro deployments include:
+- **Restricted-Area Monitoring** -- raise alerts when a person lingers near tracks, equipment rooms, or after-hours zones.
+- **Platform Edge Safety** -- detect prolonged presence inside a yellow-line buffer.
+- **ATM and Ticketing Security** -- identify suspicious dwell at unattended kiosks.
+- **Crowd-Free Zone Enforcement** -- monitor emergency exits and corridors that must remain clear.
+Available variants: `yolo11n`, `yolo11s`, `yolo11m`, `yolo11l`, `yolo11x`.
+Smaller variants (`yolo11n`, `yolo11s`) are recommended for high-FPS edge deployment.
+---
+## Prerequisites
+- [Install OpenVINO 2026.0.0](https://docs.openvino.ai/2026/get-started/install-openvino.html)
+- [Install Intel DLStreamer](https://dlstreamer.github.io/get_started/install/install-guide-ubuntu.html)
+---
+## Getting Started
+### Download and Quantize Model
+Run the provided script to download, export to OpenVINO IR (FP16), and quantize to INT8:
+```bash
+chmod +x export_and_quantize.sh
+./export_and_quantize.sh yolo11n
+```
+Replace `yolo11n` with any variant (`yolo11s`, `yolo11m`, `yolo11l`, `yolo11x`).
+The script performs the following steps:
+1. Installs dependencies (`openvino`, `nncf`, `ultralytics`).
+2. Downloads the PyTorch weights and exports to OpenVINO IR with `half=True`.
+3. Quantizes the model to INT8 using NNCF post-training quantization.
+4. Runs `benchmark_app` to validate throughput.
+Output files:
+- `yolo11n_openvino_model/` -- FP16 OpenVINO IR model directory.
+- `yolo11n_loitering_int8.xml` / `yolo11n_loitering_int8.bin` -- INT8 quantized model.
+> **Note:** For production accuracy, replace the random calibration tensors in
+> `export_and_quantize.sh` with a representative sample of frames from the
+> target deployment site.
+### Defining the Region of Interest
+The zone is a list of pixel-space `(x, y)` polygon vertices in clockwise order,
+expressed in the original input frame coordinates (not the 640x640 model input).
+A typical platform-edge zone might be:
+```python
+ZONE_POLYGON = [(420, 380), (1500, 380), (1500, 540), (420, 540)]
+LOITERING_SECONDS = 10.0
+```
+Per-person dwell time is measured at the bottom-center of the bounding box
+(the foot anchor), which most closely approximates the person's ground position.
+### DLStreamer Sample
+The sample below runs the YOLO11 detector via `gvadetect`, attaches persistent
+track IDs with `gvatrack`, and uses the DLStreamer Python bindings
+(`gstgva.VideoFrame`) to filter `person` regions, test whether each tracked
+person's foot anchor lies inside the zone polygon, accumulate dwell time per
+`object_id`, and print a loitering event when the threshold is exceeded.
+> **Notes on running this sample:**
+>
+> - Use the FP16 IR (`yolo11n_openvino_model/yolo11n.xml`).
+>   On DLStreamer 2026.0.0, `gvadetect` cannot auto-derive a YOLO post-processor
+>   from the INT8 model produced by the bundled script (the quantize/dequantize
+>   layers shift the output node names away from the names the auto-postproc
+>   expects).
+>   To use the INT8 model, supply a matching `model-proc` JSON.
+> - `gvadetect` requires `labels-file=` to map class indices to names. The
+>   sample creates a `coco.txt` next to the script.
+> - Filtering with `object-class=person` directly on `gvadetect` is rejected
+>   when `inference-region` is `full-frame` (the default), so the sample
+>   filters by `region.label()` in the buffer probe instead.
+> - The DLStreamer Python module is not on `sys.path` by default. Export
+>   `PYTHONPATH` before running:
+>
+>   ```bash
+>   source /opt/intel/openvino_2026/setupvars.sh
+>   source /opt/intel/dlstreamer/scripts/setup_dls_env.sh
+>   export PYTHONPATH=/opt/intel/dlstreamer/python:\
+>   /opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-}
+>   ```
+Create the COCO labels file once (one class per line, in COCO order):
+```bash
+python3 - <<'PY'
+names = [
+    "person","bicycle","car","motorcycle","airplane","bus","train","truck",
+    "boat","traffic light","fire hydrant","stop sign","parking meter","bench",
+    "bird","cat","dog","horse","sheep","cow","elephant","bear","zebra",
+    "giraffe","backpack","umbrella","handbag","tie","suitcase","frisbee",
+    "skis","snowboard","sports ball","kite","baseball bat","baseball glove",
+    "skateboard","surfboard","tennis racket","bottle","wine glass","cup",
+    "fork","knife","spoon","bowl","banana","apple","sandwich","orange",
+    "broccoli","carrot","hot dog","pizza","donut","cake","chair","couch",
+    "potted plant","bed","dining table","toilet","tv","laptop","mouse",
+    "remote","keyboard","cell phone","microwave","oven","toaster","sink",
+    "refrigerator","book","clock","vase","scissors","teddy bear","hair drier",
+    "toothbrush",
+]
+open("coco.txt", "w").write("\n".join(names))
+PY
+```
+```python
+from collections import defaultdict
+import cv2
+import gi
+import numpy as np
+gi.require_version("Gst", "1.0")
+gi.require_version("GstVideo", "1.0")
+from gi.repository import Gst
+from gstgva import VideoFrame
+Gst.init(None)
+MODEL_XML = "yolo11n_openvino_model/yolo11n.xml"
+LABELS_FILE = "coco.txt"
+INPUT_VIDEO = "test_video.mp4"
+ZONE_POLYGON = np.array(
+    [(420, 380), (1500, 380), (1500, 540), (420, 540)], dtype=np.int32,
+)
+LOITERING_SECONDS = 10.0
+pipeline_str = (
+    f"filesrc location={INPUT_VIDEO} ! decodebin ! videoconvert ! "
+    f"video/x-raw,format=BGR ! "
+    f"gvadetect model={MODEL_XML} labels-file={LABELS_FILE} device=CPU "
+    f"threshold=0.4 ! queue ! "
+    f"gvatrack tracking-type=short-term-imageless ! queue ! "
+    f"gvawatermark ! videoconvert ! autovideosink name=sink sync=false"
+)
+pipeline = Gst.parse_launch(pipeline_str)
+dwell_state: dict[int, float] = defaultdict(float)
+last_seen: dict[int, float] = {}
+flagged: set[int] = set()
+def point_in_zone(x: int, y: int) -> bool:
+    return cv2.pointPolygonTest(ZONE_POLYGON, (float(x), float(y)), False) >= 0
+def on_buffer(pad, info):
+    buf = info.get_buffer()
+    caps = pad.get_current_caps()
+    frame = VideoFrame(buf, caps=caps)
+    # Use the buffer's presentation timestamp so dwell time tracks the source
+    # video clock and is independent of the sink's `sync` setting.
+    now = buf.pts / Gst.SECOND if buf.pts != Gst.CLOCK_TIME_NONE else 0.0
+    seen_ids: set[int] = set()
+    for region in frame.regions():
+        if region.label() != "person":
+            continue
+        object_id = region.object_id()
+        if object_id <= 0:
+            continue
+        rect = region.rect()
+        foot_x = int(rect.x + rect.w / 2)
+        foot_y = int(rect.y + rect.h)
+        seen_ids.add(object_id)
+        if not point_in_zone(foot_x, foot_y):
+            dwell_state.pop(object_id, None)
+            last_seen.pop(object_id, None)
+            flagged.discard(object_id)
+            continue
+        prev = last_seen.get(object_id, now)
+        dwell_state[object_id] += now - prev
+        last_seen[object_id] = now
+        if (
+            dwell_state[object_id] >= LOITERING_SECONDS
+            and object_id not in flagged
+        ):
+            flagged.add(object_id)
+            print(
+                f"LOITERING id={object_id} "
+                f"dwell={dwell_state[object_id]:.1f}s "
+                f"anchor=({foot_x},{foot_y})",
+                flush=True,
+            )
+    for stale in list(dwell_state):
+        if stale not in seen_ids:
+            dwell_state.pop(stale, None)
+            last_seen.pop(stale, None)
+            flagged.discard(stale)
+    return Gst.PadProbeReturn.OK
+sink = pipeline.get_by_name("sink")
+sink_pad = sink.get_static_pad("sink")
+sink_pad.add_probe(Gst.PadProbeType.BUFFER, on_buffer)
+pipeline.set_state(Gst.State.PLAYING)
+bus = pipeline.get_bus()
+bus.timed_pop_filtered(
+    Gst.CLOCK_TIME_NONE,
+    Gst.MessageType.EOS | Gst.MessageType.ERROR,
+)
+pipeline.set_state(Gst.State.NULL)
+```
+To run on integrated GPU, change `device=CPU` to `device=GPU` and use
+`vapostproc` after `decodebin` for zero-copy color conversion.
+### Try It on a Sample Video
+Download a publicly hosted Intel sample clip that shows people walking through a scene:
+```bash
+wget -O test_video.mp4 \
+  https://github.com/intel-iot-devkit/sample-videos/raw/master/people-detection.mp4
+```
+The clip is 768x432 at 12 fps and shows people walking briskly through the field of view rather than truly loitering, so use a small zone in the busy part of the frame and a short dwell threshold for a meaningful demo:
+```python
+ZONE_POLYGON = np.array(
+    [(220, 180), (560, 180), (560, 360), (220, 360)], dtype=np.int32,
+)
+LOITERING_SECONDS = 1.5
+```
+Run the DLStreamer sample above.
+A window opened by `autovideosink` shows each frame with `gvawatermark` bounding boxes and persistent track IDs assigned by `gvatrack`.
+With the threshold above, the buffer probe prints two events on this clip, for example:
+```text
+LOITERING id=2 dwell=1.6s anchor=(529,258)
+LOITERING id=9 dwell=1.6s anchor=(527,250)
+```
+Increasing `LOITERING_SECONDS` back to its operational default (around 10 s) suppresses the events on this short walking clip; reproduce a real loitering scenario with a stationary subject in your own footage.
+To capture the annotated output instead of viewing it live, replace `autovideosink` with an encoder branch such as `x264enc ! mp4mux ! filesink location=loitering_output.mp4`.
+---
+## License
+Copyright (C) Intel Corporation. All rights reserved.
+Licensed under the MIT License. See [LICENSE](LICENSE) for details.
+## References
+- [YOLO11 Documentation](https://docs.ultralytics.com/models/yolo11/)
+- [OpenVINO YOLO11 Notebook](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/yolov11-optimization/yolov11-object-detection.ipynb)
+- [Intel DLStreamer Object Tracking](https://dlstreamer.github.io/elements/gvatrack.html)
+- [OpenVINO Documentation](https://docs.openvino.ai/)
+- [NNCF Post-Training Quantization](https://docs.openvino.ai/latest/nncf_ptq_introduction.html)
+- [COCO Dataset](https://cocodataset.org/)

export_and_quantize.sh ADDED Viewed

	@@ -0,0 +1,53 @@

+#!/usr/bin/env bash
+# SPDX-License-Identifier: MIT
+# Copyright (C) Intel Corporation
+#
+# Export a YOLO11 person detector for loitering detection and quantize to INT8.
+# Usage: ./export_and_quantize.sh [MODEL_VARIANT]
+# Example: ./export_and_quantize.sh yolo11n
+set -euo pipefail
+MODEL_NAME="${1:-yolo11n}"
+echo "--- Installing dependencies ---"
+pip install -qU "openvino>=2026.0.0" "nncf>=3.0.0" ultralytics
+echo "--- Exporting ${MODEL_NAME} to OpenVINO IR (FP16) ---"
+python3 -c "
+from ultralytics import YOLO
+model = YOLO('${MODEL_NAME}.pt')
+model.export(format='openvino', half=True, dynamic=False, imgsz=640)
+print('Export complete: ${MODEL_NAME}_openvino_model/')
+"
+echo "--- Quantizing to INT8 with NNCF ---"
+python3 -c "
+import nncf
+import openvino as ov
+import numpy as np
+core = ov.Core()
+model = core.read_model('${MODEL_NAME}_openvino_model/${MODEL_NAME}.xml')
+def transform_fn(data_item):
+    return np.random.rand(1, 3, 640, 640).astype(np.float32)
+calibration_dataset = nncf.Dataset(list(range(300)), transform_fn)
+quantized = nncf.quantize(
+    model,
+    calibration_dataset,
+    preset=nncf.QuantizationPreset.MIXED,
+    subset_size=300,
+)
+ov.save_model(quantized, '${MODEL_NAME}_loitering_int8.xml')
+print('Quantization complete: ${MODEL_NAME}_loitering_int8.xml')
+"
+echo "--- Benchmarking ---"
+benchmark_app -m "${MODEL_NAME}_loitering_int8.xml" -d CPU -niter 50 -api async
+echo "--- Done ---"