anpr-ocr / README.md
yuvraj
changes in README link
eadbc5a
---
license: apache-2.0
language:
- en
library_name: paddlepaddle
tags: [anpr, ocr, license-plate, indian-plates, PP-OCRv5]
---
# Data-Intelligent ANPR: Scalable License Plate Recognition Under Real-World Data Constraints
## Abstract
This release provides Awiros-ANPR-OCR, a 37M-parameter specialist model for
Automatic Number Plate Recognition (ANPR) on Indian license plates. The model
is built on the PP-OCRv5 encoder-decoder backbone (SVTR_HGNet with PPHGNetV2_B4)
and fine-tuned on a curated 558,767-sample corpus spanning both standard
single-row and non-standard dual-row Indian plate formats.
Starting from only 6,839 publicly available labeled samples, the training
corpus was grown through a data engineering pipeline combining synthetic data
synthesis, consensus pseudo-labeling, distribution-aware curation, VLM-assisted
data cleanup, and state-balanced batch sampling. The resulting model achieves
**98.42% accuracy** with **sub-6ms on-device inference** on an NVIDIA RTX 3090
--- a 1,260x latency advantage over frontier multimodal models like Gemini.
For the full data curation and training methodology, refer to our technical
report: [Technical Report](TechnicalReport.pdf).
## Evaluation and Results
All systems were evaluated on a shared held-out validation set constructed
using a distribution-aware split covering all Indian state codes, including
both standard and non-standard plate formats.
| System | Params | Overall Acc. | 1-Row Acc. | 2-Row Acc. | Latency Avg (ms) | Throughput (img/s) |
| --- | --- | --- | --- | --- | --- | --- |
| **Awiros-ANPR-OCR (Ours)** | **37.3M** | **98.42%** | **98.83%** | **96.91%** | **5.09** | **196.5** |
| Gemini-3-flash-preview | ~5-10B | 93.89% | 94.70% | 91.20% | 6,430 | 0.2 |
| Gemini-2.5-flash-preview | ~5B | 87.23% | 89.66% | 78.38% | --- | --- |
| Tencent HunyuanOCR | 996M | 67.62% | 76.65% | 34.78% | 309.15 | 3.2 |
| PP-OCRv5 Pretrained | 53.6M | 57.96% | 73.55% | 0.24% | 5.25 | 190.6 |
Latency measured on a single NVIDIA RTX 3090 GPU (batch size 1). Gemini
latency is end-to-end API round-trip. PP-OCRv5 Pretrained shares the same
architecture but uses original pretrained weights without domain-specific
fine-tuning --- the 57.96% to 98.42% gap is entirely a data story.
## Qualitative Comparison
Representative samples where Awiros-ANPR-OCR correctly transcribes the plate
while all baselines produce errors. Common failure modes for baselines include
confusing visually similar characters (Q→0, V→Y, M→R, B→8) and truncating
dual-row plates.
| Plate Image | Ground Truth | Awiros (Ours) | Gemini 3 | Gemini 2.5 | Tencent |
| --- | --- | --- | --- | --- | --- |
| ![Plate 1](images/two_row-set-1_1_1420_.jpg) | `UP14BQ9208` | `UP14BQ9208` | `UP14B09208` | `UP14B09208` | `UP14B` |
| ![Plate 2](images/10_20_68f0b5ea88aefc002f543e60_awi_1.jpg) | `HR35M2576` | `HR35M2576` | `HR35R2576` | `HR35R2576` | `HR35K2576` |
| ![Plate 3](images/10_20_68fc5fb360b026003091b0eb_awi_1.jpg) | `HR34M4007` | `HR34M4007` | `HR34H4007` | `HR34M40D7` | `HR36M4007` |
| ![Plate 4](images/9_16_68f325be3a50df002e43a509_awi_1.jpg) | `HR51BV8223` | `HR51BV8223` | `HR51BY8223` | `HR51BY8223` | `HR51BY8223` |
| ![Plate 5](images/10_21_68ecddf9925604002f70876a_awi_1.jpg) | `HR38AB2421` | `HR38AB2421` | `HR38A8242` | `HR38A82421` | `HR38A` |
| ![Plate 6](images/10_14_68e6fcf21e55ac002f310971_awi_1.jpg) | `HR12AX8522` | `HR12AX8522` | `HR12AX0522` | `HR12AX0522` | `HR12AX0522` |
| ![Plate 7](images/9_25_6901aa035e29cc002f516180_awi_1.jpg) | `HR46E0227` | `HR46E0227` | `HR26E0227` | `HR26E0227` | `HR6E0227` |
Recurring character confusions across baselines: `Q→0`, `M→R/K/H`, `V→Y`,
`B→8`, `8→0`, `4→2`. Tencent also truncates several dual-row and
low-contrast plates.
## Key Design Decisions
- **End-to-end architecture**: Eliminates brittle multi-stage pre-processing
pipelines (perspective normalization, row segmentation, per-region
recognition) that prior systems relied upon
- **Consensus pseudo-labeling**: Two independently trained models must agree on
a transcription before it is accepted as a label, substantially reducing
pseudo-label noise
- **Distribution-aware curation**: Non-linear bucket-wise train/val splits
ensure rare state codes are not lost to validation
- **State-balanced batch sampling**: Uniform state-code sampling within each
batch prevents training dynamics from being dominated by high-frequency states
- **Negative sample training**: Unreadable plates labeled with an abstention
token suppress hallucination on degraded inputs
## Model Inference
Use the official PaddleOCR repository to run single-image inference with this
release model.
1. Clone PaddleOCR and move into the repository root.
```bash
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
```
2. Install dependencies.
```bash
pip install paddlepaddle # or paddlepaddle-gpu
pip install safetensors pillow opencv-python pyyaml
```
3. Copy `test.py` and `en_dict.txt` from this release folder into the
PaddleOCR repository root.
4. Place `model.safetensors` in the PaddleOCR repository root (or specify the
path via `--weights`).
5. Run inference on a single image.
```bash
python test.py \
--image_path path/to/plate_crop.jpg \
--weights model.safetensors \
--device gpu
```
6. Run inference on a directory of images.
```bash
python test.py \
--image_path path/to/plate_crops/ \
--weights model.safetensors \
--device gpu \
--output_json results.json
```
## Architecture Details
| Component | Value |
| --- | --- |
| Framework | PaddlePaddle / PP-OCRv5 |
| Backbone | PPHGNetV2_B4 |
| Head | MultiHead (CTCHead + NRTRHead) |
| Input shape | 3 x 48 x 320 |
| Character set | 0-9, A-Z, a-z, space (63 classes) |
| Max text length | 25 |
| Parameters | 37.3M |
| Export format | SafeTensors (from PaddlePaddle params) |
## Summary
We present a practical, data-centric ANPR framework that achieves
production-grade accuracy on Indian license plates without reliance on large
manually annotated datasets or frontier model scale. The same PP-OCRv5
architecture scores 57.96% out-of-the-box and 98.42% after our data
engineering pipeline --- demonstrating that the data, not the model, is the
primary driver of performance in domain-specific OCR.
Users who want to test their own models on our validation set can do so in our
[Hugging Face Space](https://huggingface.co/spaces/uv124/license-plate-ocr-benchmark).
Support for submitting `.bin` files for testing in our internal systems will be
added soon, and the link for that submission flow will be updated shortly.