YOLOv11s · VisDrone object detector

Small YOLOv11 object detector fine-tuned on the VisDrone2019 dataset, targeting aerial/surveillance-angle traffic footage — the exact viewpoint used by our telecom_videos set (fixed overhead/elevated cameras).

Part of the ktk-studio traffic-violation analytics stack.

Summary


Architecture	YOLOv11s (Ultralytics)
Input	640×640 RGB
Output	`output0` shape `[1, 14, N]` — 4 bbox + 10 class scores
Parameters	9.5 M
Weights	`best.pt` (19 MB) / `best.onnx` (37 MB, opset 19)
Epochs trained	67 (early-stop from 80)
Best epoch	52
Val mAP50	0.377
Val mAP50-95	0.219
Val Precision	0.502
Val Recall	0.388

Classes

pedestrian, people, bicycle, car, van, truck, tricycle, awning-tricycle, bus, motor — matching the VisDrone2019 taxonomy.

Training

Dataset auto-downloaded via data=VisDrone.yaml (Ultralytics).

yolo detect train data=VisDrone.yaml model=yolo11s.pt \
     epochs=80 imgsz=640 batch=32 patience=15 device=0

Hardware: NVIDIA B200 180GB. Batch 32, ~35GB VRAM. Training time ~50 min.

Usage

Ultralytics

from ultralytics import YOLO
model = YOLO("best.pt")
r = model("traffic_camera_frame.jpg", conf=0.25)
r[0].show()

ONNX Runtime (GPU)

import cv2, numpy as np, onnxruntime as ort

sess = ort.InferenceSession("best.onnx", providers=["CUDAExecutionProvider"])
img = cv2.imread("frame.jpg")
# letterbox to 640×640, RGB, [0,1], CHW
scale = min(640/img.shape[1], 640/img.shape[0])
nw, nh = int(img.shape[1]*scale), int(img.shape[0]*scale)
r = cv2.resize(img, (nw, nh))
canvas = np.full((640, 640, 3), 114, np.uint8)
pad_x, pad_y = (640-nw)//2, (640-nh)//2
canvas[pad_y:pad_y+nh, pad_x:pad_x+nw] = r
x = cv2.cvtColor(canvas, cv2.COLOR_BGR2RGB).astype(np.float32)/255.0
x = np.ascontiguousarray(x.transpose(2,0,1)[None])
y = sess.run(None, {"images": x})[0][0]  # (14, N)
# decode: y[:4] = cx,cy,w,h; y[4:14] = class scores

Intended use

Real-time object detection on traffic-camera / drone-style footage.
Drop-in replacement for TrafficCamNet when the scene has van/bus/truck/people that need to be distinguished (TrafficCamNet collapses them into car).

Notes

Short training (67 epochs) leaves room for improvement — long training schedules with stronger augmentation can push mAP50 above 0.5.
VisDrone val set contains many tiny / occluded objects; low mAP50-95 is typical for this domain and roughly matches published baselines on VisDrone.
For the production stack we deploy this model through Triton (ONNX Runtime backend) and pair with a light Python IoU tracker to get per-object IDs.

License

AGPL-3.0 (inherits Ultralytics YOLOv11 weight license).

Downloads last month: 79