File size: 6,794 Bytes
52b2e72
 
844ce1f
 
 
 
52b2e72
844ce1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2833614
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
844ce1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2833614
 
 
eadbc5a
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---

license: apache-2.0
language:
  - en
library_name: paddlepaddle
tags: [anpr, ocr, license-plate, indian-plates, PP-OCRv5]
---


# Data-Intelligent ANPR: Scalable License Plate Recognition Under Real-World Data Constraints

## Abstract

This release provides Awiros-ANPR-OCR, a 37M-parameter specialist model for
Automatic Number Plate Recognition (ANPR) on Indian license plates. The model
is built on the PP-OCRv5 encoder-decoder backbone (SVTR_HGNet with PPHGNetV2_B4)
and fine-tuned on a curated 558,767-sample corpus spanning both standard
single-row and non-standard dual-row Indian plate formats.

Starting from only 6,839 publicly available labeled samples, the training
corpus was grown through a data engineering pipeline combining synthetic data
synthesis, consensus pseudo-labeling, distribution-aware curation, VLM-assisted
data cleanup, and state-balanced batch sampling. The resulting model achieves
**98.42% accuracy** with **sub-6ms on-device inference** on an NVIDIA RTX 3090
--- a 1,260x latency advantage over frontier multimodal models like Gemini.

For the full data curation and training methodology, refer to our technical
report: [Technical Report](TechnicalReport.pdf).

## Evaluation and Results

All systems were evaluated on a shared held-out validation set constructed
using a distribution-aware split covering all Indian state codes, including
both standard and non-standard plate formats.

| System | Params | Overall Acc. | 1-Row Acc. | 2-Row Acc. | Latency Avg (ms) | Throughput (img/s) |
| --- | --- | --- | --- | --- | --- | --- |
| **Awiros-ANPR-OCR (Ours)** | **37.3M** | **98.42%** | **98.83%** | **96.91%** | **5.09** | **196.5** |
| Gemini-3-flash-preview | ~5-10B | 93.89% | 94.70% | 91.20% | 6,430 | 0.2 |
| Gemini-2.5-flash-preview | ~5B | 87.23% | 89.66% | 78.38% | --- | --- |
| Tencent HunyuanOCR | 996M | 67.62% | 76.65% | 34.78% | 309.15 | 3.2 |
| PP-OCRv5 Pretrained | 53.6M | 57.96% | 73.55% | 0.24% | 5.25 | 190.6 |

Latency measured on a single NVIDIA RTX 3090 GPU (batch size 1). Gemini
latency is end-to-end API round-trip. PP-OCRv5 Pretrained shares the same
architecture but uses original pretrained weights without domain-specific
fine-tuning --- the 57.96% to 98.42% gap is entirely a data story.

## Qualitative Comparison

Representative samples where Awiros-ANPR-OCR correctly transcribes the plate
while all baselines produce errors. Common failure modes for baselines include
confusing visually similar characters (Q→0, V→Y, M→R, B→8) and truncating
dual-row plates.

| Plate Image | Ground Truth | Awiros (Ours) | Gemini 3 | Gemini 2.5 | Tencent |
| --- | --- | --- | --- | --- | --- |
| ![Plate 1](images/two_row-set-1_1_1420_.jpg) | `UP14BQ9208` | `UP14BQ9208` | `UP14B09208` | `UP14B09208` | `UP14B` |
| ![Plate 2](images/10_20_68f0b5ea88aefc002f543e60_awi_1.jpg) | `HR35M2576` | `HR35M2576` | `HR35R2576` | `HR35R2576` | `HR35K2576` |
| ![Plate 3](images/10_20_68fc5fb360b026003091b0eb_awi_1.jpg) | `HR34M4007` | `HR34M4007` | `HR34H4007` | `HR34M40D7` | `HR36M4007` |
| ![Plate 4](images/9_16_68f325be3a50df002e43a509_awi_1.jpg) | `HR51BV8223` | `HR51BV8223` | `HR51BY8223` | `HR51BY8223` | `HR51BY8223` |
| ![Plate 5](images/10_21_68ecddf9925604002f70876a_awi_1.jpg) | `HR38AB2421` | `HR38AB2421` | `HR38A8242` | `HR38A82421` | `HR38A` |
| ![Plate 6](images/10_14_68e6fcf21e55ac002f310971_awi_1.jpg) | `HR12AX8522` | `HR12AX8522` | `HR12AX0522` | `HR12AX0522` | `HR12AX0522` |
| ![Plate 7](images/9_25_6901aa035e29cc002f516180_awi_1.jpg) | `HR46E0227` | `HR46E0227` | `HR26E0227` | `HR26E0227` | `HR6E0227` |

Recurring character confusions across baselines: `Q→0`, `M→R/K/H`, `V→Y`,
`B→8`, `8→0`, `4→2`. Tencent also truncates several dual-row and
low-contrast plates.

## Key Design Decisions

- **End-to-end architecture**: Eliminates brittle multi-stage pre-processing
  pipelines (perspective normalization, row segmentation, per-region
  recognition) that prior systems relied upon
- **Consensus pseudo-labeling**: Two independently trained models must agree on
  a transcription before it is accepted as a label, substantially reducing
  pseudo-label noise
- **Distribution-aware curation**: Non-linear bucket-wise train/val splits
  ensure rare state codes are not lost to validation
- **State-balanced batch sampling**: Uniform state-code sampling within each
  batch prevents training dynamics from being dominated by high-frequency states
- **Negative sample training**: Unreadable plates labeled with an abstention
  token suppress hallucination on degraded inputs

## Model Inference

Use the official PaddleOCR repository to run single-image inference with this
release model.

1. Clone PaddleOCR and move into the repository root.
   ```bash

   git clone https://github.com/PaddlePaddle/PaddleOCR.git

   cd PaddleOCR

   ```
2. Install dependencies.
   ```bash

   pip install paddlepaddle  # or paddlepaddle-gpu

   pip install safetensors pillow opencv-python pyyaml

   ```
3. Copy `test.py` and `en_dict.txt` from this release folder into the
   PaddleOCR repository root.
4. Place `model.safetensors` in the PaddleOCR repository root (or specify the
   path via `--weights`).
5. Run inference on a single image.
   ```bash

   python test.py \

     --image_path path/to/plate_crop.jpg \

     --weights model.safetensors \

     --device gpu

   ```
6. Run inference on a directory of images.
   ```bash

   python test.py \

     --image_path path/to/plate_crops/ \

     --weights model.safetensors \

     --device gpu \

     --output_json results.json

   ```

## Architecture Details

| Component | Value |
| --- | --- |
| Framework | PaddlePaddle / PP-OCRv5 |
| Backbone | PPHGNetV2_B4 |

| Head | MultiHead (CTCHead + NRTRHead) |

| Input shape | 3 x 48 x 320 |

| Character set | 0-9, A-Z, a-z, space (63 classes) |

| Max text length | 25 |

| Parameters | 37.3M |

| Export format | SafeTensors (from PaddlePaddle params) |



## Summary



We present a practical, data-centric ANPR framework that achieves

production-grade accuracy on Indian license plates without reliance on large

manually annotated datasets or frontier model scale. The same PP-OCRv5

architecture scores 57.96% out-of-the-box and 98.42% after our data

engineering pipeline --- demonstrating that the data, not the model, is the
primary driver of performance in domain-specific OCR.

Users who want to test their own models on our validation set can do so in our
[Hugging Face Space](https://huggingface.co/spaces/uv124/license-plate-ocr-benchmark).
Support for submitting `.bin` files for testing in our internal systems will be
added soon, and the link for that submission flow will be updated shortly.