CLPRNet with PARSeq Tiny OCR Backbone

This is a modified version of CLPRNet where the original CNN-based recognition branch has been replaced with PARSeq Tiny (Scene Text Recognition with Permuted Autoregressive Sequence Models, ECCV 2022).

Architecture

Input Image (1024Γ—1024)
      β”‚
      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Shared FPN Backbone (unchanged) β”‚
β”‚  BasicBlock stack β†’ multi-scale  β”‚
β”‚  features β†’ FPN upsampling       β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚              β”‚
       β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ at_head (1ch)β”‚  β”‚ Detection  β”‚
β”‚ LP attention β”‚  β”‚ SEBasicBlockβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚ β†’ 5ch head β”‚
                  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                        β”‚
                        β–Ό
              Bounding Boxes (NMS)
                        β”‚
                        β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚ Plate Cropping  β”‚
              β”‚ (grid_sample)   β”‚
              β”‚ β†’ (32, 128) RGB β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  PARSeq Tiny    β”‚
              β”‚  ViT Encoder    β”‚
              β”‚  (192d, 12 layers)β”‚
              β”‚  + Transformer  β”‚
              β”‚  Decoder (1 layer)β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       β”‚
                       β–Ό
              Character Logits
              (B, 9, 74)

What Changed vs Original CLPRNet

Component Original Modified
Recognition backbone 4Γ— SEBasicBlock CNN PARSeq Tiny (ViT)
Recognition head Conv2d 256β†’73 (dense spatial) Linear 192β†’74 (sequence)
Character attention 8-channel learned spatial masks Internal Transformer attention
at_head output 9 channels (1 LP + 8 char) 1 channel (LP only)
Recognition output (B, 16, 16, 584) dense grid (B, 9, 74) sequence logits
Decoding Argmax per grid cell Autoregressive/parallel sequence
Plate cropping Not needed (attention-based) Differentiable grid_sample

Parameters

Total:           8,134,550 params
PARSeq Tiny:     6,007,178 params  (recognition)
Detection:       2,127,372 params  (backbone + FPN + detection head)

Files

File Purpose
model_parseq.py Main model with PARSeq Tiny integrated
train_parseq.py Training script (updated losses)
inference_parseq.py Inference script (two-stage)

Usage

from model_parseq import create_clprnet_parseq

# Create model
model = create_clprnet_parseq(max_label_length=8)

# Training forward pass (with GT boxes + labels)
y_det, y_rec, at_lp, plate_counts = model(
    images,                    # (B, 3, 1024, 1024)
    boxes_lurd=gt_boxes,       # list of (N_i, 4) tensors [l,t,r,b]
    plate_labels=gt_labels     # list of plate strings
)

# Inference (detection only)
y_det, _, at_lp, _ = model(images)

# Full inference (detect + recognize)
plates, confs = model.recognize_plates(images, detected_boxes)

PARSeq Tiny Specs

Based on DeiT-Ti configuration from the PARSeq paper:

Parameter Value
embed_dim 192
Encoder heads 3
Encoder depth 12
Decoder heads 6
Decoder depth 1
MLP ratio 4
Patch size (4, 8)
Input size (32, 128)
Max label length 8
Charset 73 Chinese LP chars + EOS

Requirements

torch >= 2.0
torchvision
numpy
opencv-python
Pillow

Training

Uses the same CCPD + CRPD datasets as the original CLPRNet. See train_parseq.py.

Credits

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for theakhilshukla/CLPRNet-PARSeq