File size: 3,070 Bytes
9f04531
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
license: apache-2.0
tags:
  - vehicle-recognition
  - object-detection
  - vehicle-reidentification
  - license-plate-ocr
  - onnx
  - dinov2
  - multi-task
pipeline_tag: object-detection
---

# VehicleDINO

**Unified multi-task vehicle recognition model** — detection, type classification, make/model identification, re-identification, and license plate OCR in a single forward pass.

## Architecture

- **Backbone:** DINOv2 ViT-B/14 (frozen, with LoRA adapters)
- **Neck:** SimpleFPN (768 -> 256) + HybridEncoder (AIFI + CCFM)
- **Decoder:** RT-DETR-style with 300 detection queries + 1 global attribute query
- **Heads:** 6 task-specific heads (det, type, make, model, Re-ID, OCR)

## Model Variants

| File | Format | Size | Notes |
|------|--------|------|-------|
| `vehicledino_dinov2.onnx` | FP32 | 450 MB | Full precision |
| `vehicledino_dinov2_int8.onnx` | INT8 | 139 MB | Quantized, 3.2x smaller |

## Input / Output

**Input:** `images` — float32 tensor `(1, 3, 560, 560)`, ImageNet-normalized RGB

**Outputs:**

| Tensor | Shape | Description |
|--------|-------|-------------|
| `det_boxes` | (1, 300, 4) | Detection boxes (cx, cy, w, h normalized) |
| `det_classes` | (1, 300, 5) | Detection class logits (car, suv, truck, bus, van) |
| `vehicle_types` | (1, 1, 8) | Vehicle type logits |
| `makes` | (1, 1, 42) | Make classification logits |
| `models` | (1, 1, 323) | Model classification logits |
| `reid_embeds` | (1, 1, 256) | L2-normalized Re-ID embedding |
| `ocr_logits` | (1, 1, 8, 37) | License plate OCR logits (8 positions, 37 chars) |

## Performance (Test Set)

| Task | Metric | Score |
|------|--------|-------|
| Type Classification | Top-1 Accuracy | 95.6% |
| Make Classification | Top-1 Accuracy | 98.4% |
| Model Classification | Top-1 Accuracy | 87.7% |
| Re-ID (VeRi-776) | mAP | 61.1% |
| Re-ID (VeRi-776) | Rank-1 | 86.1% |

## Training Data

- **Detection + Type + Re-ID:** VeRi-776 (776 vehicles, 49,360 images)
- **Make/Model:** CompCars (42 makes, 323 models)
- **OCR:** CCPD-Green (Chinese license plates)

## Usage with ONNX Runtime (Python)

```python
import onnxruntime as ort
import numpy as np
from PIL import Image

session = ort.InferenceSession("vehicledino_dinov2_int8.onnx")

# Preprocess: resize to 560x560, ImageNet normalize
img = Image.open("car.jpg").resize((560, 560))
arr = np.array(img).astype(np.float32) / 255.0
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
arr = (arr - mean) / std
tensor = arr.transpose(2, 0, 1)[np.newaxis]  # (1, 3, 560, 560)

outputs = session.run(None, {"images": tensor.astype(np.float32)})
```

## Usage in Browser (ONNX Runtime Web)

The INT8 model runs in the browser via ONNX Runtime Web with WebGPU or WASM backend.

**Live demo:** [https://yolov11-plate-recognition.swmengappdev.workers.dev](https://yolov11-plate-recognition.swmengappdev.workers.dev)

## Citation

```bibtex
@article{vehicledino2026,
  title={VehicleDINO: Unified Multi-Task Vehicle Recognition via DINOv2 Features},
  author={Soh, Wei Meng},
  year={2026}
}
```

## License

Apache 2.0