--- license: apache-2.0 tags: - vehicle-recognition - object-detection - vehicle-reidentification - license-plate-ocr - onnx - dinov2 - multi-task pipeline_tag: object-detection --- # VehicleDINO **Unified multi-task vehicle recognition model** — detection, type classification, make/model identification, re-identification, and license plate OCR in a single forward pass. ## Architecture - **Backbone:** DINOv2 ViT-B/14 (frozen, with LoRA adapters) - **Neck:** SimpleFPN (768 -> 256) + HybridEncoder (AIFI + CCFM) - **Decoder:** RT-DETR-style with 300 detection queries + 1 global attribute query - **Heads:** 6 task-specific heads (det, type, make, model, Re-ID, OCR) ## Model Variants | File | Format | Size | Notes | |------|--------|------|-------| | `vehicledino_dinov2.onnx` | FP32 | 450 MB | Full precision | | `vehicledino_dinov2_int8.onnx` | INT8 | 139 MB | Quantized, 3.2x smaller | ## Input / Output **Input:** `images` — float32 tensor `(1, 3, 560, 560)`, ImageNet-normalized RGB **Outputs:** | Tensor | Shape | Description | |--------|-------|-------------| | `det_boxes` | (1, 300, 4) | Detection boxes (cx, cy, w, h normalized) | | `det_classes` | (1, 300, 5) | Detection class logits (car, suv, truck, bus, van) | | `vehicle_types` | (1, 1, 8) | Vehicle type logits | | `makes` | (1, 1, 42) | Make classification logits | | `models` | (1, 1, 323) | Model classification logits | | `reid_embeds` | (1, 1, 256) | L2-normalized Re-ID embedding | | `ocr_logits` | (1, 1, 8, 37) | License plate OCR logits (8 positions, 37 chars) | ## Performance (Test Set) | Task | Metric | Score | |------|--------|-------| | Type Classification | Top-1 Accuracy | 95.6% | | Make Classification | Top-1 Accuracy | 98.4% | | Model Classification | Top-1 Accuracy | 87.7% | | Re-ID (VeRi-776) | mAP | 61.1% | | Re-ID (VeRi-776) | Rank-1 | 86.1% | ## Training Data - **Detection + Type + Re-ID:** VeRi-776 (776 vehicles, 49,360 images) - **Make/Model:** CompCars (42 makes, 323 models) - **OCR:** CCPD-Green (Chinese license plates) ## Usage with ONNX Runtime (Python) ```python import onnxruntime as ort import numpy as np from PIL import Image session = ort.InferenceSession("vehicledino_dinov2_int8.onnx") # Preprocess: resize to 560x560, ImageNet normalize img = Image.open("car.jpg").resize((560, 560)) arr = np.array(img).astype(np.float32) / 255.0 mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] arr = (arr - mean) / std tensor = arr.transpose(2, 0, 1)[np.newaxis] # (1, 3, 560, 560) outputs = session.run(None, {"images": tensor.astype(np.float32)}) ``` ## Usage in Browser (ONNX Runtime Web) The INT8 model runs in the browser via ONNX Runtime Web with WebGPU or WASM backend. **Live demo:** [https://yolov11-plate-recognition.swmengappdev.workers.dev](https://yolov11-plate-recognition.swmengappdev.workers.dev) ## Citation ```bibtex @article{vehicledino2026, title={VehicleDINO: Unified Multi-Task Vehicle Recognition via DINOv2 Features}, author={Soh, Wei Meng}, year={2026} } ``` ## License Apache 2.0