File size: 6,794 Bytes
52b2e72 844ce1f 52b2e72 844ce1f 2833614 844ce1f 2833614 eadbc5a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | ---
license: apache-2.0
language:
- en
library_name: paddlepaddle
tags: [anpr, ocr, license-plate, indian-plates, PP-OCRv5]
---
# Data-Intelligent ANPR: Scalable License Plate Recognition Under Real-World Data Constraints
## Abstract
This release provides Awiros-ANPR-OCR, a 37M-parameter specialist model for
Automatic Number Plate Recognition (ANPR) on Indian license plates. The model
is built on the PP-OCRv5 encoder-decoder backbone (SVTR_HGNet with PPHGNetV2_B4)
and fine-tuned on a curated 558,767-sample corpus spanning both standard
single-row and non-standard dual-row Indian plate formats.
Starting from only 6,839 publicly available labeled samples, the training
corpus was grown through a data engineering pipeline combining synthetic data
synthesis, consensus pseudo-labeling, distribution-aware curation, VLM-assisted
data cleanup, and state-balanced batch sampling. The resulting model achieves
**98.42% accuracy** with **sub-6ms on-device inference** on an NVIDIA RTX 3090
--- a 1,260x latency advantage over frontier multimodal models like Gemini.
For the full data curation and training methodology, refer to our technical
report: [Technical Report](TechnicalReport.pdf).
## Evaluation and Results
All systems were evaluated on a shared held-out validation set constructed
using a distribution-aware split covering all Indian state codes, including
both standard and non-standard plate formats.
| System | Params | Overall Acc. | 1-Row Acc. | 2-Row Acc. | Latency Avg (ms) | Throughput (img/s) |
| --- | --- | --- | --- | --- | --- | --- |
| **Awiros-ANPR-OCR (Ours)** | **37.3M** | **98.42%** | **98.83%** | **96.91%** | **5.09** | **196.5** |
| Gemini-3-flash-preview | ~5-10B | 93.89% | 94.70% | 91.20% | 6,430 | 0.2 |
| Gemini-2.5-flash-preview | ~5B | 87.23% | 89.66% | 78.38% | --- | --- |
| Tencent HunyuanOCR | 996M | 67.62% | 76.65% | 34.78% | 309.15 | 3.2 |
| PP-OCRv5 Pretrained | 53.6M | 57.96% | 73.55% | 0.24% | 5.25 | 190.6 |
Latency measured on a single NVIDIA RTX 3090 GPU (batch size 1). Gemini
latency is end-to-end API round-trip. PP-OCRv5 Pretrained shares the same
architecture but uses original pretrained weights without domain-specific
fine-tuning --- the 57.96% to 98.42% gap is entirely a data story.
## Qualitative Comparison
Representative samples where Awiros-ANPR-OCR correctly transcribes the plate
while all baselines produce errors. Common failure modes for baselines include
confusing visually similar characters (Q→0, V→Y, M→R, B→8) and truncating
dual-row plates.
| Plate Image | Ground Truth | Awiros (Ours) | Gemini 3 | Gemini 2.5 | Tencent |
| --- | --- | --- | --- | --- | --- |
|  | `UP14BQ9208` | `UP14BQ9208` | `UP14B09208` | `UP14B09208` | `UP14B` |
|  | `HR35M2576` | `HR35M2576` | `HR35R2576` | `HR35R2576` | `HR35K2576` |
|  | `HR34M4007` | `HR34M4007` | `HR34H4007` | `HR34M40D7` | `HR36M4007` |
|  | `HR51BV8223` | `HR51BV8223` | `HR51BY8223` | `HR51BY8223` | `HR51BY8223` |
|  | `HR38AB2421` | `HR38AB2421` | `HR38A8242` | `HR38A82421` | `HR38A` |
|  | `HR12AX8522` | `HR12AX8522` | `HR12AX0522` | `HR12AX0522` | `HR12AX0522` |
|  | `HR46E0227` | `HR46E0227` | `HR26E0227` | `HR26E0227` | `HR6E0227` |
Recurring character confusions across baselines: `Q→0`, `M→R/K/H`, `V→Y`,
`B→8`, `8→0`, `4→2`. Tencent also truncates several dual-row and
low-contrast plates.
## Key Design Decisions
- **End-to-end architecture**: Eliminates brittle multi-stage pre-processing
pipelines (perspective normalization, row segmentation, per-region
recognition) that prior systems relied upon
- **Consensus pseudo-labeling**: Two independently trained models must agree on
a transcription before it is accepted as a label, substantially reducing
pseudo-label noise
- **Distribution-aware curation**: Non-linear bucket-wise train/val splits
ensure rare state codes are not lost to validation
- **State-balanced batch sampling**: Uniform state-code sampling within each
batch prevents training dynamics from being dominated by high-frequency states
- **Negative sample training**: Unreadable plates labeled with an abstention
token suppress hallucination on degraded inputs
## Model Inference
Use the official PaddleOCR repository to run single-image inference with this
release model.
1. Clone PaddleOCR and move into the repository root.
```bash
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
```
2. Install dependencies.
```bash
pip install paddlepaddle # or paddlepaddle-gpu
pip install safetensors pillow opencv-python pyyaml
```
3. Copy `test.py` and `en_dict.txt` from this release folder into the
PaddleOCR repository root.
4. Place `model.safetensors` in the PaddleOCR repository root (or specify the
path via `--weights`).
5. Run inference on a single image.
```bash
python test.py \
--image_path path/to/plate_crop.jpg \
--weights model.safetensors \
--device gpu
```
6. Run inference on a directory of images.
```bash
python test.py \
--image_path path/to/plate_crops/ \
--weights model.safetensors \
--device gpu \
--output_json results.json
```
## Architecture Details
| Component | Value |
| --- | --- |
| Framework | PaddlePaddle / PP-OCRv5 |
| Backbone | PPHGNetV2_B4 |
| Head | MultiHead (CTCHead + NRTRHead) |
| Input shape | 3 x 48 x 320 |
| Character set | 0-9, A-Z, a-z, space (63 classes) |
| Max text length | 25 |
| Parameters | 37.3M |
| Export format | SafeTensors (from PaddlePaddle params) |
## Summary
We present a practical, data-centric ANPR framework that achieves
production-grade accuracy on Indian license plates without reliance on large
manually annotated datasets or frontier model scale. The same PP-OCRv5
architecture scores 57.96% out-of-the-box and 98.42% after our data
engineering pipeline --- demonstrating that the data, not the model, is the
primary driver of performance in domain-specific OCR.
Users who want to test their own models on our validation set can do so in our
[Hugging Face Space](https://huggingface.co/spaces/uv124/license-plate-ocr-benchmark).
Support for submitting `.bin` files for testing in our internal systems will be
added soon, and the link for that submission flow will be updated shortly.
|