Crowd Detection
| Property | Value |
|---|---|
| Category | Object Detection (Crowd / Person Counting) |
| Base Model | YOLO26 (Ultralytics) |
| Source Framework | PyTorch (Ultralytics) |
| Supported Precisions | FP32, FP16, INT8 (mixed-precision) |
| Inference Engine | OpenVINO |
| Hardware | CPU, GPU, NPU |
| Detected Class | person (COCO class 0) |
Overview
Crowd Detection is a Metro Analytics use case that detects and counts people in video streams to estimate occupancy and identify crowd build-up.
It is built on YOLO26, a state-of-the-art real-time object detector trained on the COCO dataset, quantized to INT8 and filtered at runtime to the person class.
Typical Metro deployments include:
- Platform Occupancy -- count waiting passengers on station platforms.
- Entry / Exit Flow -- monitor pedestrian throughput at gates and turnstiles.
- Crowd Build-up Alerts -- trigger notifications when person counts cross a threshold.
- Public Safety Analytics -- support situational awareness in transit hubs and venues.
Available variants: yolo26n, yolo26s, yolo26m, yolo26l, yolo26x.
Smaller variants (yolo26n, yolo26s) are recommended for high-FPS edge deployment; larger variants improve recall in dense crowds.
Prerequisites
- Python 3.11+
- Install OpenVINO (latest version)
- Install Intel DLStreamer
Create and activate a Python virtual environment before running the scripts:
python3 -m venv .venv --system-site-packages
source .venv/bin/activate
Note: The
--system-site-packagesflag is required so the virtual environment can access the system-installed OpenVINO and DLStreamer Python packages.
Getting Started
Download and Quantize Model
Run the provided script to download, export to OpenVINO IR, and optionally quantize:
chmod +x export_and_quantize.sh
./export_and_quantize.sh
This exports the default yolo26n model in FP16 precision.
Optional: Select a Different Variant or Precision
./export_and_quantize.sh yolo26n FP32 # full-precision
./export_and_quantize.sh yolo26n INT8 # quantized
./export_and_quantize.sh yolo26s # larger variant, default FP16
Replace yolo26n with any variant (yolo26s, yolo26m, yolo26l, yolo26x).
The second argument selects the precision (FP32, FP16, INT8); the default is FP16.
The script performs the following steps:
- Installs dependencies (
openvino,ultralytics; addsnncffor INT8). - Downloads a sample test image (
test.jpg) and a sample test video (test_video.mp4). - Downloads the PyTorch weights and exports to OpenVINO IR.
- (INT8 only) Quantizes the model using NNCF post-training quantization.
Output files:
yolo26n_openvino_model/-- FP32 or FP16 OpenVINO IR model directory.yolo26n_crowd_int8.xml/yolo26n_crowd_int8.bin-- INT8 quantized model (only whenINT8is selected).
Precision / Device Compatibility
| Precision | CPU | GPU | NPU |
|---|---|---|---|
| FP32 | Yes | Yes | No |
| FP16 | Yes | Yes | Yes |
| INT8 | Yes | Yes | Yes |
Note: The INT8 calibration uses the bundled sample image. For production accuracy, replace it with a representative set of frames from the target deployment site.
OpenVINO Sample
The sample below runs YOLO26 inference, filters to the person class, applies
non-maximum suppression, and reports the crowd count for a single image.
import cv2
import numpy as np
import openvino as ov
PERSON_CLASS_ID = 0
CONF_THRESHOLD = 0.4
INPUT_SIZE = 640
core = ov.Core()
model = core.read_model("yolo26n_openvino_model/yolo26n.xml")
compiled = core.compile_model(model, "CPU") # or "GPU", "NPU"
image = cv2.imread("test.jpg")
h0, w0 = image.shape[:2]
# Preprocess: letterbox-free resize for simplicity.
blob = cv2.resize(image, (INPUT_SIZE, INPUT_SIZE))
blob = cv2.cvtColor(blob, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
blob = blob.transpose(2, 0, 1)[np.newaxis, ...] # NCHW
# YOLO26 end-to-end output: [1, 300, 6] = [x1, y1, x2, y2, confidence, class_id]
# No NMS is needed -- YOLO26 is natively end-to-end.
output = compiled([blob])[compiled.output(0)][0]
mask = (output[:, 4] >= CONF_THRESHOLD) & (output[:, 5].astype(int) == PERSON_CLASS_ID)
dets = output[mask]
sx, sy = w0 / INPUT_SIZE, h0 / INPUT_SIZE
crowd_count = len(dets)
print(f"Detected persons: {crowd_count}")
for det in dets:
x1 = int(det[0] * sx)
y1 = int(det[1] * sy)
x2 = int(det[2] * sx)
y2 = int(det[3] * sy)
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(
image, f"Crowd count: {crowd_count}", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 255, 0), 2,
)
cv2.imwrite("output_openvino.jpg", image)
Try It on a Sample Image
The export_and_quantize.sh script downloads test.jpg automatically.
Re-run the OpenVINO sample above.
The script reads test.jpg, prints the crowd count to the console, and writes the annotated frame to output_openvino.jpg.
Expected console output:
Detected persons: 4
output_openvino.jpg is the same image with a green bounding box drawn around each detected person and the text Crowd count: 4 overlaid in the top-left corner.
Tip: For production testing, replace the bundled
test.jpgwith an image from your target deployment site showing a representative crowd density.
Expected Output
DLStreamer Sample
The pipeline below runs the FP16 YOLO26 detector on the sample video via
gvadetect, filters detections to the person class in a buffer probe using
the DLStreamer Python bindings (gstgva.VideoFrame), overlays bounding boxes,
saves the annotated result to output_dlstreamer.mp4, and prints the crowd count per
frame.
Notes on running this sample:
Use the FP16 IR (
yolo26n_openvino_model/yolo26n.xml). On DLStreamer 2026.0.0,gvadetectcannot auto-derive a YOLO post-processor from the INT8 model produced by the bundled script. To use the INT8 model, supply a matchingmodel-procJSON.Class names are read automatically from the model's embedded
metadata.yamlby DLStreamer 2026.0+ -- no externallabels-fileis required.Filtering with
object-class=persondirectly ongvadetectis rejected wheninference-regionisfull-frame(the default), so the sample filters byregion.label()in the buffer probe instead.Export
PYTHONPATHso the DLStreamer Python module is importable:source /opt/intel/openvino_2026/setupvars.sh source /opt/intel/dlstreamer/scripts/setup_dls_env.sh export PYTHONPATH=/opt/intel/dlstreamer/python:\ /opt/intel/dlstreamer/gstreamer/lib/python3/dist-packages:${PYTHONPATH:-}
import gi
gi.require_version("Gst", "1.0")
gi.require_version("GstVideo", "1.0")
from gi.repository import Gst
from gstgva import VideoFrame
Gst.init(None)
INPUT_VIDEO = "test_video.mp4"
# For CPU: change device=GPU to device=CPU.
# For NPU: change device=GPU to device=NPU (batch-size=1, nireq=4 recommended).
pipeline_str = (
f"filesrc location={INPUT_VIDEO} ! decodebin3 ! "
"videoconvert ! "
"gvadetect model=yolo26n_openvino_model/yolo26n.xml "
"device=GPU "
"threshold=0.4 ! queue ! "
"gvawatermark ! videoconvert ! video/x-raw,format=I420 ! "
"openh264enc ! h264parse ! "
"mp4mux ! filesink name=sink location=output_dlstreamer.mp4"
)
pipeline = Gst.parse_launch(pipeline_str)
sink = pipeline.get_by_name("sink")
sink_pad = sink.get_static_pad("sink")
def on_buffer(pad, info):
buf = info.get_buffer()
caps = pad.get_current_caps()
frame = VideoFrame(buf, caps=caps)
crowd_count = sum(1 for r in frame.regions() if r.label() == "person")
if crowd_count:
print(f"Crowd count: {crowd_count}", flush=True)
return Gst.PadProbeReturn.OK
sink_pad.add_probe(Gst.PadProbeType.BUFFER, on_buffer)
pipeline.set_state(Gst.State.PLAYING)
bus = pipeline.get_bus()
bus.timed_pop_filtered(
Gst.CLOCK_TIME_NONE,
Gst.MessageType.EOS | Gst.MessageType.ERROR,
)
pipeline.set_state(Gst.State.NULL)
Expected Output
Device targets:
device=GPU-- default in the sample code.device=CPU-- changedevice=GPUtodevice=CPU.device=NPU-- changedevice=GPUtodevice=NPU; usebatch-size=1andnireq=4for best NPU utilization.
License
Copyright (C) Intel Corporation. All rights reserved. Licensed under the MIT License. See LICENSE for details.

