CLPRNet with PARSeq Tiny OCR Backbone

This is a modified version of CLPRNet where the original CNN-based recognition branch has been replaced with PARSeq Tiny (Scene Text Recognition with Permuted Autoregressive Sequence Models, ECCV 2022).

Architecture

Input Image (1024×1024)
      │
      ▼
┌─────────────────────────────────┐
│  Shared FPN Backbone (unchanged) │
│  BasicBlock stack → multi-scale  │
│  features → FPN upsampling       │
└──────┬──────────────┬────────────┘
       │              │
       ▼              ▼
┌──────────────┐  ┌────────────┐
│ at_head (1ch)│  │ Detection  │
│ LP attention │  │ SEBasicBlock│
└──────────────┘  │ → 5ch head │
                  └─────┬──────┘
                        │
                        ▼
              Bounding Boxes (NMS)
                        │
                        ▼
              ┌─────────────────┐
              │ Plate Cropping  │
              │ (grid_sample)   │
              │ → (32, 128) RGB │
              └────────┬────────┘
                       │
                       ▼
              ┌─────────────────┐
              │  PARSeq Tiny    │
              │  ViT Encoder    │
              │  (192d, 12 layers)│
              │  + Transformer  │
              │  Decoder (1 layer)│
              └────────┬────────┘
                       │
                       ▼
              Character Logits
              (B, 9, 74)

What Changed vs Original CLPRNet

Component	Original	Modified
Recognition backbone	4× SEBasicBlock CNN	PARSeq Tiny (ViT)
Recognition head	Conv2d 256→73 (dense spatial)	Linear 192→74 (sequence)
Character attention	8-channel learned spatial masks	Internal Transformer attention
at_head output	9 channels (1 LP + 8 char)	1 channel (LP only)
Recognition output	(B, 16, 16, 584) dense grid	(B, 9, 74) sequence logits
Decoding	Argmax per grid cell	Autoregressive/parallel sequence
Plate cropping	Not needed (attention-based)	Differentiable grid_sample

Parameters

Total:           8,134,550 params
PARSeq Tiny:     6,007,178 params  (recognition)
Detection:       2,127,372 params  (backbone + FPN + detection head)

Files

File	Purpose
`model_parseq.py`	Main model with PARSeq Tiny integrated
`train_parseq.py`	Training script (updated losses)
`inference_parseq.py`	Inference script (two-stage)

Usage

from model_parseq import create_clprnet_parseq

# Create model
model = create_clprnet_parseq(max_label_length=8)

# Training forward pass (with GT boxes + labels)
y_det, y_rec, at_lp, plate_counts = model(
    images,                    # (B, 3, 1024, 1024)
    boxes_lurd=gt_boxes,       # list of (N_i, 4) tensors [l,t,r,b]
    plate_labels=gt_labels     # list of plate strings
)

# Inference (detection only)
y_det, _, at_lp, _ = model(images)

# Full inference (detect + recognize)
plates, confs = model.recognize_plates(images, detected_boxes)

PARSeq Tiny Specs

Based on DeiT-Ti configuration from the PARSeq paper:

Parameter	Value
embed_dim	192
Encoder heads	3
Encoder depth	12
Decoder heads	6
Decoder depth	1
MLP ratio	4
Patch size	(4, 8)
Input size	(32, 128)
Max label length	8
Charset	73 Chinese LP chars + EOS

Requirements

torch >= 2.0
torchvision
numpy
opencv-python
Pillow

Training

Uses the same CCPD + CRPD datasets as the original CLPRNet. See train_parseq.py.

Credits

Original CLPRNet: https://github.com/wulb97/CLPRNet
PARSeq: https://github.com/baudm/parseq (ECCV 2022)

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for theakhilshukla/CLPRNet-PARSeq

Scene Text Recognition with Permuted Autoregressive Sequence Models

Paper • 2207.06966 • Published Jul 14, 2022 • 1