YOLOv11s · VisDrone object detector

Small YOLOv11 object detector fine-tuned on the VisDrone2019 dataset, targeting aerial/surveillance-angle traffic footage — the exact viewpoint used by our telecom_videos set (fixed overhead/elevated cameras).

Part of the ktk-studio traffic-violation analytics stack.

Summary

Architecture YOLOv11s (Ultralytics)
Input 640×640 RGB
Output output0 shape [1, 14, N] — 4 bbox + 10 class scores
Parameters 9.5 M
Weights best.pt (19 MB) / best.onnx (37 MB, opset 19)
Epochs trained 67 (early-stop from 80)
Best epoch 52
Val mAP50 0.377
Val mAP50-95 0.219
Val Precision 0.502
Val Recall 0.388

Classes

pedestrian, people, bicycle, car, van, truck, tricycle, awning-tricycle, bus, motor — matching the VisDrone2019 taxonomy.

Training

Dataset auto-downloaded via data=VisDrone.yaml (Ultralytics).

yolo detect train data=VisDrone.yaml model=yolo11s.pt \
     epochs=80 imgsz=640 batch=32 patience=15 device=0

Hardware: NVIDIA B200 180GB. Batch 32, ~35GB VRAM. Training time ~50 min.

Usage

Ultralytics

from ultralytics import YOLO
model = YOLO("best.pt")
r = model("traffic_camera_frame.jpg", conf=0.25)
r[0].show()

ONNX Runtime (GPU)

import cv2, numpy as np, onnxruntime as ort

sess = ort.InferenceSession("best.onnx", providers=["CUDAExecutionProvider"])
img = cv2.imread("frame.jpg")
# letterbox to 640×640, RGB, [0,1], CHW
scale = min(640/img.shape[1], 640/img.shape[0])
nw, nh = int(img.shape[1]*scale), int(img.shape[0]*scale)
r = cv2.resize(img, (nw, nh))
canvas = np.full((640, 640, 3), 114, np.uint8)
pad_x, pad_y = (640-nw)//2, (640-nh)//2
canvas[pad_y:pad_y+nh, pad_x:pad_x+nw] = r
x = cv2.cvtColor(canvas, cv2.COLOR_BGR2RGB).astype(np.float32)/255.0
x = np.ascontiguousarray(x.transpose(2,0,1)[None])
y = sess.run(None, {"images": x})[0][0]  # (14, N)
# decode: y[:4] = cx,cy,w,h; y[4:14] = class scores

Intended use

  • Real-time object detection on traffic-camera / drone-style footage.
  • Drop-in replacement for TrafficCamNet when the scene has van/bus/truck/people that need to be distinguished (TrafficCamNet collapses them into car).

Notes

  • Short training (67 epochs) leaves room for improvement — long training schedules with stronger augmentation can push mAP50 above 0.5.
  • VisDrone val set contains many tiny / occluded objects; low mAP50-95 is typical for this domain and roughly matches published baselines on VisDrone.
  • For the production stack we deploy this model through Triton (ONNX Runtime backend) and pair with a light Python IoU tracker to get per-object IDs.

License

AGPL-3.0 (inherits Ultralytics YOLOv11 weight license).

Downloads last month
79
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support