| --- |
| license: apache-2.0 |
| tags: |
| - vehicle-recognition |
| - object-detection |
| - vehicle-reidentification |
| - license-plate-ocr |
| - onnx |
| - dinov2 |
| - multi-task |
| pipeline_tag: object-detection |
| --- |
| |
| # VehicleDINO |
|
|
| **Unified multi-task vehicle recognition model** — detection, type classification, make/model identification, re-identification, and license plate OCR in a single forward pass. |
|
|
| ## Architecture |
|
|
| - **Backbone:** DINOv2 ViT-B/14 (frozen, with LoRA adapters) |
| - **Neck:** SimpleFPN (768 -> 256) + HybridEncoder (AIFI + CCFM) |
| - **Decoder:** RT-DETR-style with 300 detection queries + 1 global attribute query |
| - **Heads:** 6 task-specific heads (det, type, make, model, Re-ID, OCR) |
|
|
| ## Model Variants |
|
|
| | File | Format | Size | Notes | |
| |------|--------|------|-------| |
| | `vehicledino_dinov2.onnx` | FP32 | 450 MB | Full precision | |
| | `vehicledino_dinov2_int8.onnx` | INT8 | 139 MB | Quantized, 3.2x smaller | |
|
|
| ## Input / Output |
|
|
| **Input:** `images` — float32 tensor `(1, 3, 560, 560)`, ImageNet-normalized RGB |
|
|
| **Outputs:** |
|
|
| | Tensor | Shape | Description | |
| |--------|-------|-------------| |
| | `det_boxes` | (1, 300, 4) | Detection boxes (cx, cy, w, h normalized) | |
| | `det_classes` | (1, 300, 5) | Detection class logits (car, suv, truck, bus, van) | |
| | `vehicle_types` | (1, 1, 8) | Vehicle type logits | |
| | `makes` | (1, 1, 42) | Make classification logits | |
| | `models` | (1, 1, 323) | Model classification logits | |
| | `reid_embeds` | (1, 1, 256) | L2-normalized Re-ID embedding | |
| | `ocr_logits` | (1, 1, 8, 37) | License plate OCR logits (8 positions, 37 chars) | |
|
|
| ## Performance (Test Set) |
|
|
| | Task | Metric | Score | |
| |------|--------|-------| |
| | Type Classification | Top-1 Accuracy | 95.6% | |
| | Make Classification | Top-1 Accuracy | 98.4% | |
| | Model Classification | Top-1 Accuracy | 87.7% | |
| | Re-ID (VeRi-776) | mAP | 61.1% | |
| | Re-ID (VeRi-776) | Rank-1 | 86.1% | |
|
|
| ## Training Data |
|
|
| - **Detection + Type + Re-ID:** VeRi-776 (776 vehicles, 49,360 images) |
| - **Make/Model:** CompCars (42 makes, 323 models) |
| - **OCR:** CCPD-Green (Chinese license plates) |
|
|
| ## Usage with ONNX Runtime (Python) |
|
|
| ```python |
| import onnxruntime as ort |
| import numpy as np |
| from PIL import Image |
| |
| session = ort.InferenceSession("vehicledino_dinov2_int8.onnx") |
| |
| # Preprocess: resize to 560x560, ImageNet normalize |
| img = Image.open("car.jpg").resize((560, 560)) |
| arr = np.array(img).astype(np.float32) / 255.0 |
| mean = [0.485, 0.456, 0.406] |
| std = [0.229, 0.224, 0.225] |
| arr = (arr - mean) / std |
| tensor = arr.transpose(2, 0, 1)[np.newaxis] # (1, 3, 560, 560) |
| |
| outputs = session.run(None, {"images": tensor.astype(np.float32)}) |
| ``` |
|
|
| ## Usage in Browser (ONNX Runtime Web) |
|
|
| The INT8 model runs in the browser via ONNX Runtime Web with WebGPU or WASM backend. |
|
|
| **Live demo:** [https://yolov11-plate-recognition.swmengappdev.workers.dev](https://yolov11-plate-recognition.swmengappdev.workers.dev) |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{vehicledino2026, |
| title={VehicleDINO: Unified Multi-Task Vehicle Recognition via DINOv2 Features}, |
| author={Soh, Wei Meng}, |
| year={2026} |
| } |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0 |
|
|