Sync crowd-detection from metro-analytics-catalog

0924885 verified 1 day ago

9.85 kB

	---
	license: other
	license_name: intel-custom
	license_link: LICENSE
	library_name: openvino
	pipeline_tag: object-detection
	tags:
	- openvino
	- intel
	- yolo
	- yolo26
	- crowd-detection
	- person-counting
	- edge-ai
	- metro
	- dlstreamer
	datasets:
	- detection-datasets/coco
	language:
	- en
	---

	# Crowd Detection

	\| Property \| Value \|
	\|---\|---\|
	\| Category \| Object Detection (Crowd / Person Counting) \|
	\| Base Model \| [YOLO26](https://docs.ultralytics.com/models/yolo26/) (Ultralytics) \|
	\| Source Framework \| PyTorch (Ultralytics) \|
	\| Supported Precisions \| FP32, FP16, INT8 (mixed-precision) \|
	\| Inference Engine \| OpenVINO \|
	\| Hardware \| CPU, GPU, NPU \|
	\| Detected Class \| `person` (COCO class 0) \|

	---

	## Overview

	Crowd Detection is a Metro Analytics use case that detects and counts people in video streams to estimate occupancy and identify crowd build-up.
	It is built on [YOLO26](https://docs.ultralytics.com/models/yolo26/), a state-of-the-art real-time object detector trained on the COCO dataset, quantized to INT8 and filtered at runtime to the `person` class.
	Typical Metro deployments include:

	- Platform Occupancy -- count waiting passengers on station platforms.
	- Entry / Exit Flow -- monitor pedestrian throughput at gates and turnstiles.
	- Crowd Build-up Alerts -- trigger notifications when person counts cross a threshold.
	- Public Safety Analytics -- support situational awareness in transit hubs and venues.

	Available variants: `yolo26n`, `yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`.
	Smaller variants (`yolo26n`, `yolo26s`) are recommended for high-FPS edge deployment; larger variants improve recall in dense crowds.

	---

	## Prerequisites

	- Python 3.11+
	- [Install OpenVINO](https://docs.openvino.ai/2026/get-started/install-openvino.html) (latest version)
	- [Install Intel DLStreamer](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/get_started/install/install_guide_ubuntu.html)

	Create and activate a Python virtual environment before running the scripts:

	```bash
	python3 -m venv .venv --system-site-packages
	source .venv/bin/activate
	```

	> Note: The `--system-site-packages` flag is required so the virtual
	> environment can access the system-installed OpenVINO and DLStreamer Python
	> packages.

	---

	## Getting Started

	### Download and Quantize Model

	Run the provided script to download, export to OpenVINO IR, and optionally quantize:

	```bash
	chmod +x export_and_quantize.sh
	./export_and_quantize.sh
	```

	This exports the default yolo26n model in FP16 precision.

	#### Optional: Select a Different Variant or Precision

	```bash
	./export_and_quantize.sh yolo26n FP32 # full-precision
	./export_and_quantize.sh yolo26n INT8 # quantized
	./export_and_quantize.sh yolo26s # larger variant, default FP16
	```

	Replace `yolo26n` with any variant (`yolo26s`, `yolo26m`, `yolo26l`, `yolo26x`).
	The second argument selects the precision (`FP32`, `FP16`, `INT8`); the default is FP16.

	The script performs the following steps:

	1. Installs dependencies (`openvino`, `ultralytics`; adds `nncf` for INT8).
	2. Downloads a sample test image (`test.jpg`) and a sample test video (`test_video.mp4`).
	3. Downloads the PyTorch weights and exports to OpenVINO IR.
	4. (INT8 only) Quantizes the model using NNCF post-training quantization.

	Output files:

	- `yolo26n_openvino_model/` -- FP32 or FP16 OpenVINO IR model directory.
	- `yolo26n_crowd_int8.xml` / `yolo26n_crowd_int8.bin` -- INT8 quantized model (only when `INT8` is selected).

	#### Precision / Device Compatibility

	\| Precision \| CPU \| GPU \| NPU \|
	\|---\|---\|---\|---\|
	\| FP32 \| Yes \| Yes \| No \|
	\| FP16 \| Yes \| Yes \| Yes \|
	\| INT8 \| Yes \| Yes \| Yes \|

	> Note: The INT8 calibration uses the bundled sample image.
	> For production accuracy, replace it with a representative set of frames from
	> the target deployment site.

	### OpenVINO Sample

	The sample below runs YOLO26 inference, filters to the `person` class, applies
	non-maximum suppression, and reports the crowd count for a single image.

	```python
	import cv2
	import numpy as np
	import openvino as ov

	PERSON_CLASS_ID = 0
	CONF_THRESHOLD = 0.4
	INPUT_SIZE = 640

	core = ov.Core()
	model = core.read_model("yolo26n_openvino_model/yolo26n.xml")
	compiled = core.compile_model(model, "CPU") # or "GPU", "NPU"

	image = cv2.imread("test.jpg")
	h0, w0 = image.shape[:2]

	# Preprocess: letterbox-free resize for simplicity.
	blob = cv2.resize(image, (INPUT_SIZE, INPUT_SIZE))
	blob = cv2.cvtColor(blob, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
	blob = blob.transpose(2, 0, 1)[np.newaxis, ...] # NCHW

	# YOLO26 end-to-end output: [1, 300, 6] = [x1, y1, x2, y2, confidence, class_id]
	# No NMS is needed -- YOLO26 is natively end-to-end.
	output = compiled([blob])[compiled.output(0)][0]
	mask = (output[:, 4] >= CONF_THRESHOLD) & (output[:, 5].astype(int) == PERSON_CLASS_ID)
	dets = output[mask]

	sx, sy = w0 / INPUT_SIZE, h0 / INPUT_SIZE
	crowd_count = len(dets)
	print(f"Detected persons: {crowd_count}")

	for det in dets:
	x1 = int(det[0] * sx)
	y1 = int(det[1] * sy)
	x2 = int(det[2] * sx)
	y2 = int(det[3] * sy)
	cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)

	cv2.putText(
	image, f"Crowd count: {crowd_count}", (10, 30),
	cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 0), 2,
	)
	cv2.imwrite("output_openvino.jpg", image)
	```

	### Try It on a Sample Image

	The `export_and_quantize.sh` script downloads `test.jpg` automatically.
	Re-run the OpenVINO sample above.
	The script reads `test.jpg`, prints the crowd count to the console, and writes the annotated frame to `output_openvino.jpg`.

	Expected console output:

	```text
	Detected persons: 4
	```

	`output_openvino.jpg` is the same image with a green bounding box drawn around each detected person and the text `Crowd count: 4` overlaid in the top-left corner.

	> Tip: For production testing, replace the bundled `test.jpg` with an image
	> from your target deployment site showing a representative crowd density.

	#### Expected Output

	![OpenVINO expected output](expected_output_openvino.jpg)

	### DLStreamer Sample

	The pipeline below runs the FP16 YOLO26 detector on the sample video via
	`gvadetect`, filters detections to the `person` class in a buffer probe using
	the DLStreamer Python bindings (`gstgva.VideoFrame`), overlays bounding boxes,
	saves the annotated result to `output_dlstreamer.mp4`, and prints the crowd count per
	frame.

	> Notes on running this sample:
	>
	> - Use the FP16 IR (`yolo26n_openvino_model/yolo26n.xml`).
	> On DLStreamer 2026.0.0, `gvadetect` cannot auto-derive a YOLO post-processor
	> from the INT8 model produced by the bundled script.
	> To use the INT8 model, supply a matching `model-proc` JSON.
	> - Class names are read automatically from the model's embedded
	> `metadata.yaml` by DLStreamer 2026.0+ -- no external `labels-file` is
	> required.
	> - Filtering with `object-class=person` directly on `gvadetect` is rejected
	> when `inference-region` is `full-frame` (the default), so the sample
	> filters by `region.label()` in the buffer probe instead.
	> - Export `PYTHONPATH` so the DLStreamer Python module is importable:
	>
	> ```bash
	> source /opt/intel/openvino_2026/setupvars.sh
	> source /opt/intel/dlstreamer/scripts/setup_dls_env.sh
	> export PYTHONPATH=/opt/intel/dlstreamer/python:\
	> /opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-}
	> ```

	```python
	import gi

	gi.require_version("Gst", "1.0")
	gi.require_version("GstVideo", "1.0")
	from gi.repository import Gst
	from gstgva import VideoFrame

	Gst.init(None)

	INPUT_VIDEO = "test_video.mp4"

	# For CPU: change device=GPU to device=CPU.
	# For NPU: change device=GPU to device=NPU (batch-size=1, nireq=4 recommended).
	pipeline_str = (
	f"filesrc location={INPUT_VIDEO} ! decodebin3 ! "
	"videoconvert ! "
	"gvadetect model=yolo26n_openvino_model/yolo26n.xml "
	"device=GPU "
	"threshold=0.4 ! queue ! "
	"gvawatermark ! videoconvert ! video/x-raw,format=I420 ! "
	"openh264enc ! h264parse ! "
	"mp4mux ! filesink name=sink location=output_dlstreamer.mp4"
	)
	pipeline = Gst.parse_launch(pipeline_str)

	sink = pipeline.get_by_name("sink")
	sink_pad = sink.get_static_pad("sink")


	def on_buffer(pad, info):
	buf = info.get_buffer()
	caps = pad.get_current_caps()
	frame = VideoFrame(buf, caps=caps)
	crowd_count = sum(1 for r in frame.regions() if r.label() == "person")
	if crowd_count:
	print(f"Crowd count: {crowd_count}", flush=True)
	return Gst.PadProbeReturn.OK


	sink_pad.add_probe(Gst.PadProbeType.BUFFER, on_buffer)

	pipeline.set_state(Gst.State.PLAYING)
	bus = pipeline.get_bus()
	bus.timed_pop_filtered(
	Gst.CLOCK_TIME_NONE,
	Gst.MessageType.EOS \| Gst.MessageType.ERROR,
	)
	pipeline.set_state(Gst.State.NULL)
	```

	#### Expected Output

	![DLStreamer expected output](expected_output_dlstreamer.gif)

	Device targets:

	- `device=GPU` -- default in the sample code.
	- `device=CPU` -- change `device=GPU` to `device=CPU`.
	- `device=NPU` -- change `device=GPU` to `device=NPU`; use `batch-size=1` and `nireq=4` for best NPU utilization.

	---

	## License

	Copyright (C) Intel Corporation. All rights reserved.
	Licensed under the MIT License. See [LICENSE](LICENSE) for details.

	## References

	- [YOLO26 Documentation](https://docs.ultralytics.com/models/yolo26/)
	- [OpenVINO YOLO26 Notebook](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/yolov26-optimization/yolov26-object-detection.ipynb)
	- [COCO Dataset](https://cocodataset.org/)
	- [OpenVINO Documentation](https://docs.openvino.ai/)
	- [NNCF Post-Training Quantization](https://docs.openvino.ai/latest/nncf_ptq_introduction.html)
	- [Intel DLStreamer](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/index.html)