anpr-ocr / README.md

yuvraj

changes in README link

eadbc5a 2 months ago

6.79 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: paddlepaddle
	tags: [anpr, ocr, license-plate, indian-plates, PP-OCRv5]
	---

	# Data-Intelligent ANPR: Scalable License Plate Recognition Under Real-World Data Constraints

	## Abstract

	This release provides Awiros-ANPR-OCR, a 37M-parameter specialist model for
	Automatic Number Plate Recognition (ANPR) on Indian license plates. The model
	is built on the PP-OCRv5 encoder-decoder backbone (SVTR_HGNet with PPHGNetV2_B4)
	and fine-tuned on a curated 558,767-sample corpus spanning both standard
	single-row and non-standard dual-row Indian plate formats.

	Starting from only 6,839 publicly available labeled samples, the training
	corpus was grown through a data engineering pipeline combining synthetic data
	synthesis, consensus pseudo-labeling, distribution-aware curation, VLM-assisted
	data cleanup, and state-balanced batch sampling. The resulting model achieves
	98.42% accuracy with sub-6ms on-device inference on an NVIDIA RTX 3090
	--- a 1,260x latency advantage over frontier multimodal models like Gemini.

	For the full data curation and training methodology, refer to our technical
	report: [Technical Report](TechnicalReport.pdf).

	## Evaluation and Results

	All systems were evaluated on a shared held-out validation set constructed
	using a distribution-aware split covering all Indian state codes, including
	both standard and non-standard plate formats.

	\| System \| Params \| Overall Acc. \| 1-Row Acc. \| 2-Row Acc. \| Latency Avg (ms) \| Throughput (img/s) \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| Awiros-ANPR-OCR (Ours) \| 37.3M \| 98.42% \| 98.83% \| 96.91% \| 5.09 \| 196.5 \|
	\| Gemini-3-flash-preview \| ~5-10B \| 93.89% \| 94.70% \| 91.20% \| 6,430 \| 0.2 \|
	\| Gemini-2.5-flash-preview \| ~5B \| 87.23% \| 89.66% \| 78.38% \| --- \| --- \|
	\| Tencent HunyuanOCR \| 996M \| 67.62% \| 76.65% \| 34.78% \| 309.15 \| 3.2 \|
	\| PP-OCRv5 Pretrained \| 53.6M \| 57.96% \| 73.55% \| 0.24% \| 5.25 \| 190.6 \|

	Latency measured on a single NVIDIA RTX 3090 GPU (batch size 1). Gemini
	latency is end-to-end API round-trip. PP-OCRv5 Pretrained shares the same
	architecture but uses original pretrained weights without domain-specific
	fine-tuning --- the 57.96% to 98.42% gap is entirely a data story.

	## Qualitative Comparison

	Representative samples where Awiros-ANPR-OCR correctly transcribes the plate
	while all baselines produce errors. Common failure modes for baselines include
	confusing visually similar characters (Q→0, V→Y, M→R, B→8) and truncating
	dual-row plates.

	\| Plate Image \| Ground Truth \| Awiros (Ours) \| Gemini 3 \| Gemini 2.5 \| Tencent \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| ![Plate 1](images/two_row-set-1_1_1420_.jpg) \| `UP14BQ9208` \| `UP14BQ9208` \| `UP14B09208` \| `UP14B09208` \| `UP14B` \|
	\| ![Plate 2](images/10_20_68f0b5ea88aefc002f543e60_awi_1.jpg) \| `HR35M2576` \| `HR35M2576` \| `HR35R2576` \| `HR35R2576` \| `HR35K2576` \|
	\| ![Plate 3](images/10_20_68fc5fb360b026003091b0eb_awi_1.jpg) \| `HR34M4007` \| `HR34M4007` \| `HR34H4007` \| `HR34M40D7` \| `HR36M4007` \|
	\| ![Plate 4](images/9_16_68f325be3a50df002e43a509_awi_1.jpg) \| `HR51BV8223` \| `HR51BV8223` \| `HR51BY8223` \| `HR51BY8223` \| `HR51BY8223` \|
	\| ![Plate 5](images/10_21_68ecddf9925604002f70876a_awi_1.jpg) \| `HR38AB2421` \| `HR38AB2421` \| `HR38A8242` \| `HR38A82421` \| `HR38A` \|
	\| ![Plate 6](images/10_14_68e6fcf21e55ac002f310971_awi_1.jpg) \| `HR12AX8522` \| `HR12AX8522` \| `HR12AX0522` \| `HR12AX0522` \| `HR12AX0522` \|
	\| ![Plate 7](images/9_25_6901aa035e29cc002f516180_awi_1.jpg) \| `HR46E0227` \| `HR46E0227` \| `HR26E0227` \| `HR26E0227` \| `HR6E0227` \|

	Recurring character confusions across baselines: `Q→0`, `M→R/K/H`, `V→Y`,
	`B→8`, `8→0`, `4→2`. Tencent also truncates several dual-row and
	low-contrast plates.

	## Key Design Decisions

	- End-to-end architecture: Eliminates brittle multi-stage pre-processing
	pipelines (perspective normalization, row segmentation, per-region
	recognition) that prior systems relied upon
	- Consensus pseudo-labeling: Two independently trained models must agree on
	a transcription before it is accepted as a label, substantially reducing
	pseudo-label noise
	- Distribution-aware curation: Non-linear bucket-wise train/val splits
	ensure rare state codes are not lost to validation
	- State-balanced batch sampling: Uniform state-code sampling within each
	batch prevents training dynamics from being dominated by high-frequency states
	- Negative sample training: Unreadable plates labeled with an abstention
	token suppress hallucination on degraded inputs

	## Model Inference

	Use the official PaddleOCR repository to run single-image inference with this
	release model.

	1. Clone PaddleOCR and move into the repository root.
	```bash
	git clone https://github.com/PaddlePaddle/PaddleOCR.git
	cd PaddleOCR
	```
	2. Install dependencies.
	```bash
	pip install paddlepaddle # or paddlepaddle-gpu
	pip install safetensors pillow opencv-python pyyaml
	```
	3. Copy `test.py` and `en_dict.txt` from this release folder into the
	PaddleOCR repository root.
	4. Place `model.safetensors` in the PaddleOCR repository root (or specify the
	path via `--weights`).
	5. Run inference on a single image.
	```bash
	python test.py \
	--image_path path/to/plate_crop.jpg \
	--weights model.safetensors \
	--device gpu
	```
	6. Run inference on a directory of images.
	```bash
	python test.py \
	--image_path path/to/plate_crops/ \
	--weights model.safetensors \
	--device gpu \
	--output_json results.json
	```

	## Architecture Details

	\| Component \| Value \|
	\| --- \| --- \|
	\| Framework \| PaddlePaddle / PP-OCRv5 \|
	\| Backbone \| PPHGNetV2_B4 \|
	\| Head \| MultiHead (CTCHead + NRTRHead) \|
	\| Input shape \| 3 x 48 x 320 \|
	\| Character set \| 0-9, A-Z, a-z, space (63 classes) \|
	\| Max text length \| 25 \|
	\| Parameters \| 37.3M \|
	\| Export format \| SafeTensors (from PaddlePaddle params) \|

	## Summary

	We present a practical, data-centric ANPR framework that achieves
	production-grade accuracy on Indian license plates without reliance on large
	manually annotated datasets or frontier model scale. The same PP-OCRv5
	architecture scores 57.96% out-of-the-box and 98.42% after our data
	engineering pipeline --- demonstrating that the data, not the model, is the
	primary driver of performance in domain-specific OCR.

	Users who want to test their own models on our validation set can do so in our
	[Hugging Face Space](https://huggingface.co/spaces/uv124/license-plate-ocr-benchmark).
	Support for submitting `.bin` files for testing in our internal systems will be
	added soon, and the link for that submission flow will be updated shortly.