Sync loitering-detection from metro-analytics-catalog

a2753c3 verified 1 day ago

10.6 kB

	---
	license: other
	license_name: intel-custom
	license_link: LICENSE
	library_name: openvino
	pipeline_tag: object-detection
	tags:
	- openvino
	- intel
	- yolo
	- yolo26
	- loitering-detection
	- zone-analytics
	- tracking
	- edge-ai
	- metro
	- dlstreamer
	datasets:
	- detection-datasets/coco
	language:
	- en
	---

	# Loitering Detection

	\| Property \| Value \|
	\|---\|---\|
	\| Category \| Object Detection + Tracking + Zone Analytics \|
	\| Source Framework \| PyTorch (Ultralytics) \|
	\| Supported Precisions \| FP32, FP16, INT8 (mixed-precision) \|
	\| Inference Engine \| OpenVINO \|
	\| Hardware \| CPU, GPU, NPU \|
	\| Detected Class \| `person` (COCO class 0) \|

	---

	## Overview

	Loitering Detection is a Metro Analytics use case that flags people who remain inside a configurable region of interest for longer than a dwell-time threshold.
	It is built on [YOLO26](https://docs.ultralytics.com/models/yolo26/) for person detection, paired with a multi-object tracker that assigns persistent IDs across frames.
	A polygon zone defines the area to monitor; for each tracked person whose bounding-box anchor falls inside the zone, the application accumulates dwell time and raises a loitering event when the threshold is exceeded.

	Typical Metro deployments include:

	- Restricted-Area Monitoring -- raise alerts when a person lingers near tracks, equipment rooms, or after-hours zones.
	- Platform Edge Safety -- detect prolonged presence inside a yellow-line buffer.
	- ATM and Ticketing Security -- identify suspicious dwell at unattended kiosks.
	- Crowd-Free Zone Enforcement -- monitor emergency exits and corridors that must remain clear.

	Available variants: `yolo26n`, `yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`.
	Smaller variants (`yolo26n`, `yolo26s`) are recommended for high-FPS edge deployment.

	---

	## Prerequisites

	- Python 3.11+
	- [Install Intel DLStreamer](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/get_started/install/install_guide_ubuntu.html)

	Create and activate a Python virtual environment before running the scripts:

	```bash
	python3 -m venv .venv --system-site-packages
	source .venv/bin/activate
	```

	> Note: The `--system-site-packages` flag is required so the virtual
	> environment can access the system-installed OpenVINO and DLStreamer Python
	> packages.

	---

	## Getting Started

	### Download and Quantize Model

	Run the provided script to download, export to OpenVINO IR, and optionally quantize:

	```bash
	chmod +x export_and_quantize.sh
	./export_and_quantize.sh
	```

	This exports the default yolo26n model in FP16 precision.

	#### Optional: Select a Different Variant or Precision

	```bash
	./export_and_quantize.sh yolo26n FP32 # full-precision
	./export_and_quantize.sh yolo26n INT8 # quantized
	./export_and_quantize.sh yolo26s # larger variant, default FP16
	```

	Replace `yolo26n` with any variant (`yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`).
	The second argument selects the precision (`FP32`, `FP16`, `INT8`); the default is FP16.

	The script performs the following steps:

	1. Installs dependencies (`openvino`, `ultralytics`; adds `nncf` for INT8).
	2. Downloads the sample surveillance video (`VIRAT_S_000101.mp4`) from the Intel Metro AI Suite project into the current directory.
	3. Downloads the PyTorch weights and exports to OpenVINO IR.
	4. (INT8 only) Quantizes the model using NNCF post-training quantization.

	Output files:

	- `yolo26n_openvino_model/` -- FP32 or FP16 OpenVINO IR model directory.
	- `yolo26n_loitering_int8.xml` / `yolo26n_loitering_int8.bin` -- INT8 quantized model (only when `INT8` is selected).

	#### Precision / Device Compatibility

	\| Precision \| CPU \| GPU \| NPU \|
	\|---\|---\|---\|---\|
	\| FP32 \| Yes \| Yes \| No \|
	\| FP16 \| Yes \| Yes \| Yes \|
	\| INT8 \| Yes \| Yes \| Yes \|

	> Note: The INT8 calibration uses frames from the bundled sample video.
	> For production accuracy, replace it with a representative set of frames from
	> the target deployment site.

	### Defining the Region of Interest

	The zone is a rectangular ROI expressed as `x_min,y_min,x_max,y_max` in the
	original input frame coordinates (not the 640x640 model input).
	DLStreamer's `gvaattachroi` element attaches the ROI to every buffer, and
	`gvadetect inference-region=1` (`roi-list`) restricts inference to that ROI
	only -- no Python polygon math required.
	A typical surveillance-zone configuration on a 1280x720 source might be:

	```text
	roi=400,200,1100,650 # ROI for gvaattachroi (x_min,y_min,x_max,y_max)
	LOITERING_SECONDS = 5.0 # dwell threshold, in seconds (demo value)
	```

	> Note: The sample uses a 5-second threshold so that loitering events are
	> triggered quickly on the short demo video. For production deployments,
	> increase this to 10--30 seconds depending on the site's operational
	> requirements.

	Per-person dwell time is measured at the bottom-center of the bounding box
	(the foot anchor), which most closely approximates the person's ground position.

	### DLStreamer Sample

	- The DLStreamer Python module is not on `sys.path` by default. Export `PYTHONPATH` before running:

	```bash
	source /opt/intel/openvino_2026/setupvars.sh
	source /opt/intel/dlstreamer/scripts/setup_dls_env.sh
	export PYTHONPATH=/opt/intel/dlstreamer/python:\
	/opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-}
	```

	Video-based loitering detection (requires video for dwell-time tracking):

	```python
	from collections import defaultdict

	import gi

	gi.require_version("Gst", "1.0")
	gi.require_version("GstVideo", "1.0")
	from gi.repository import Gst
	from gstgva import VideoFrame

	Gst.init(None)

	MODEL_XML = "yolo26n_openvino_model/yolo26n.xml"
	INPUT_VIDEO = "VIRAT_S_000101.mp4"
	ROI = "0,200,300,400" # x_min,y_min,x_max,y_max
	LOITERING_SECONDS = 5.0

	pipeline_str = (
	f"filesrc location={INPUT_VIDEO} ! decodebin3 ! "
	f"videoconvert ! "
	f"gvaattachroi roi={ROI} ! "
	f"gvadetect inference-region=1 model={MODEL_XML} device=GPU "
	f"threshold=0.5 ! queue ! "
	f"gvatrack tracking-type=short-term-imageless ! queue ! "
	f"gvametaconvert add-empty-results=true ! queue ! "
	f"gvafpscounter ! "
	f"gvawatermark ! videoconvert ! video/x-raw,format=I420 ! "
	f"openh264enc ! h264parse ! "
	f"mp4mux ! filesink name=sink location=output_dlstreamer.mp4"
	)
	pipeline = Gst.parse_launch(pipeline_str)

	STALE_TIMEOUT = 2.0 # seconds of absence before clearing dwell state
	dwell_state: dict[int, float] = defaultdict(float)
	last_seen: dict[int, float] = {}
	flagged: set[int] = set()


	def on_buffer(pad, info):
	buf = info.get_buffer()
	caps = pad.get_current_caps()
	frame = VideoFrame(buf, caps=caps)

	now = buf.pts / Gst.SECOND if buf.pts != Gst.CLOCK_TIME_NONE else 0.0
	seen_ids: set[int] = set()

	for region in frame.regions():
	# gvaattachroi attaches a frame-level ROI region; skip it.
	if region.label() != "person":
	continue
	object_id = region.object_id()
	if object_id <= 0:
	continue

	rect = region.rect()
	foot_x = int(rect.x + rect.w / 2)
	foot_y = int(rect.y + rect.h)
	seen_ids.add(object_id)

	# gvadetect inference-region=1 already constrains detections to the
	# gvaattachroi zone, so every tracked person here is "in zone".
	prev = last_seen.get(object_id, now)
	dwell_state[object_id] += now - prev
	last_seen[object_id] = now

	if (
	dwell_state[object_id] >= LOITERING_SECONDS
	and object_id not in flagged
	):
	flagged.add(object_id)
	print(
	f"LOITERING id={object_id} "
	f"dwell={dwell_state[object_id]:.1f}s "
	f"anchor=({foot_x},{foot_y})",
	flush=True,
	)

	# Clean up stale tracks after STALE_TIMEOUT seconds of absence.
	# Keep flagged entries to prevent duplicate alerts when a person
	# briefly disappears (occlusion / tracker jitter) and reappears.
	for stale in list(dwell_state):
	if stale not in seen_ids:
	elapsed_since = now - last_seen.get(stale, now)
	if elapsed_since > STALE_TIMEOUT:
	dwell_state.pop(stale, None)
	last_seen.pop(stale, None)

	return Gst.PadProbeReturn.OK


	sink = pipeline.get_by_name("sink")
	sink_pad = sink.get_static_pad("sink")
	sink_pad.add_probe(Gst.PadProbeType.BUFFER, on_buffer)

	pipeline.set_state(Gst.State.PLAYING)
	bus = pipeline.get_bus()
	bus.timed_pop_filtered(
	Gst.CLOCK_TIME_NONE,
	Gst.MessageType.EOS \| Gst.MessageType.ERROR,
	)
	pipeline.set_state(Gst.State.NULL)
	```

	Expected output with the sample video and the zone/threshold above
	(exact track IDs and anchor coordinates may vary between runs due to
	tracker non-determinism):

	```text
	LOITERING id=26 dwell=5.0s anchor=(147,341)
	LOITERING id=27 dwell=5.0s anchor=(122,337)
	LOITERING id=29 dwell=5.0s anchor=(90,322)
	...
	```

	Approximately 10–12 loitering events are expected over the full video.

	The annotated video is saved to `output_dlstreamer.mp4` with green bounding boxes and
	track IDs drawn by `gvawatermark` around every detected person.

	> Known warning: The `openh264enc` element prints
	> `[OpenH264] this = 0x..., Error:CWelsH264SVCEncoder::EncodeFrame(), cmInitParaError.`
	> on the first frame. This is a benign initialization message — the output
	> video is encoded correctly. The warning comes from the OpenH264 library's
	> internal logging and does not indicate a real error.

	#### Expected Output

	![DLStreamer expected output](expected_output_dlstreamer.gif)

	Device targets:

	- `device=GPU` -- default in the sample code.
	- `device=CPU` -- change `device=GPU` to `device=CPU`.
	- `device=NPU` -- change `device=GPU` to `device=NPU`; use `batch-size=1` and `nireq=4` for best NPU utilization.

	---

	## License

	Copyright (C) Intel Corporation. All rights reserved.
	Licensed under the MIT License. See [LICENSE](LICENSE) for details.

	## References

	- [YOLO26 Documentation](https://docs.ultralytics.com/models/yolo26/)
	- [OpenVINO YOLO26 Notebook](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/yolov26-optimization/yolov26-object-detection.ipynb)
	- [Intel DLStreamer Object Tracking](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/elements/gvatrack.html)
	- [OpenVINO Documentation](https://docs.openvino.ai/)
	- [NNCF Post-Training Quantization](https://docs.openvino.ai/latest/nncf_ptq_introduction.html)
	- [COCO Dataset](https://cocodataset.org/)