wms2537
/

VehicleDINO

Object Detection

vehicle-recognition

vehicle-reidentification

license-plate-ocr

Model card Files Files and versions

VehicleDINO / README.md

wms2537's picture

Upload README.md with huggingface_hub

9f04531 verified 3 months ago

|

history blame contribute delete

3.07 kB

	---
	license: apache-2.0
	tags:
	- vehicle-recognition
	- object-detection
	- vehicle-reidentification
	- license-plate-ocr
	- onnx
	- dinov2
	- multi-task
	pipeline_tag: object-detection
	---

	# VehicleDINO

	Unified multi-task vehicle recognition model — detection, type classification, make/model identification, re-identification, and license plate OCR in a single forward pass.

	## Architecture

	- Backbone: DINOv2 ViT-B/14 (frozen, with LoRA adapters)
	- Neck: SimpleFPN (768 -> 256) + HybridEncoder (AIFI + CCFM)
	- Decoder: RT-DETR-style with 300 detection queries + 1 global attribute query
	- Heads: 6 task-specific heads (det, type, make, model, Re-ID, OCR)

	## Model Variants

	\| File \| Format \| Size \| Notes \|
	\|------\|--------\|------\|-------\|
	\| `vehicledino_dinov2.onnx` \| FP32 \| 450 MB \| Full precision \|
	\| `vehicledino_dinov2_int8.onnx` \| INT8 \| 139 MB \| Quantized, 3.2x smaller \|

	## Input / Output

	Input: `images` — float32 tensor `(1, 3, 560, 560)`, ImageNet-normalized RGB

	Outputs:

	\| Tensor \| Shape \| Description \|
	\|--------\|-------\|-------------\|
	\| `det_boxes` \| (1, 300, 4) \| Detection boxes (cx, cy, w, h normalized) \|
	\| `det_classes` \| (1, 300, 5) \| Detection class logits (car, suv, truck, bus, van) \|
	\| `vehicle_types` \| (1, 1, 8) \| Vehicle type logits \|
	\| `makes` \| (1, 1, 42) \| Make classification logits \|
	\| `models` \| (1, 1, 323) \| Model classification logits \|
	\| `reid_embeds` \| (1, 1, 256) \| L2-normalized Re-ID embedding \|
	\| `ocr_logits` \| (1, 1, 8, 37) \| License plate OCR logits (8 positions, 37 chars) \|

	## Performance (Test Set)

	\| Task \| Metric \| Score \|
	\|------\|--------\|-------\|
	\| Type Classification \| Top-1 Accuracy \| 95.6% \|
	\| Make Classification \| Top-1 Accuracy \| 98.4% \|
	\| Model Classification \| Top-1 Accuracy \| 87.7% \|
	\| Re-ID (VeRi-776) \| mAP \| 61.1% \|
	\| Re-ID (VeRi-776) \| Rank-1 \| 86.1% \|

	## Training Data

	- Detection + Type + Re-ID: VeRi-776 (776 vehicles, 49,360 images)
	- Make/Model: CompCars (42 makes, 323 models)
	- OCR: CCPD-Green (Chinese license plates)

	## Usage with ONNX Runtime (Python)

	```python
	import onnxruntime as ort
	import numpy as np
	from PIL import Image

	session = ort.InferenceSession("vehicledino_dinov2_int8.onnx")

	# Preprocess: resize to 560x560, ImageNet normalize
	img = Image.open("car.jpg").resize((560, 560))
	arr = np.array(img).astype(np.float32) / 255.0
	mean = [0.485, 0.456, 0.406]
	std = [0.229, 0.224, 0.225]
	arr = (arr - mean) / std
	tensor = arr.transpose(2, 0, 1)[np.newaxis] # (1, 3, 560, 560)

	outputs = session.run(None, {"images": tensor.astype(np.float32)})
	```

	## Usage in Browser (ONNX Runtime Web)

	The INT8 model runs in the browser via ONNX Runtime Web with WebGPU or WASM backend.

	Live demo: [https://yolov11-plate-recognition.swmengappdev.workers.dev](https://yolov11-plate-recognition.swmengappdev.workers.dev)

	## Citation

	```bibtex
	@article{vehicledino2026,
	title={VehicleDINO: Unified Multi-Task Vehicle Recognition via DINOv2 Features},
	author={Soh, Wei Meng},
	year={2026}
	}
	```

	## License

	Apache 2.0