File size: 4,477 Bytes
10316b3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | ---
library_name: pytorch
license: mit
language:
- th
- en
tags:
- ocr
- text-recognition
- thai-id-card
- crnn
- ctc
- on-device
- mobile
- numeric-ocr
- citizen-id
pipeline_tag: image-to-text
---
# Thai ID Nano OCR β Numeric OCR Reader (SimpleCRNN (MVP))
> **MVP model.** Production upgrade: swap to `ppocrv5` variant (same interface,
> better accuracy). See `config.json` β `architecture_variant` for programmatic detection.
CTC-based text recognition model for Thai National ID card **numeric** fields,
designed for on-device inference at 30fps on mobile.
| Metric | Value |
|--------|-------|
| Architecture | SimpleCRNN (MVP) |
| Variant | `crnn` |
| ExactMatch | 98.6% |
| CharAccuracy | 99.4% |
| Parameters | 3,026,703 |
| Vocab size | 15 |
| Best epoch | 10 |
## Quick Start
```python
from huggingface_hub import hf_hub_download
model_path = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "model.pt")
vocab_path = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "vocab.txt")
config = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "config.json")
```
## Architecture
**SimpleCRNN** β CNN (4-layer) + BiLSTM (2-layer) + CTC decoder.
```
Input: [B, 3, 48, 320] (RGB, normalized to [-1, 1])
β CNN: 32β64β128β256 channels, BatchNorm+ReLU, MaxPool(2,2)Γ3
β AdaptiveAvgPool2d((1, None)) β T=40 time steps
β BiLSTM: hidden=256, layers=2, dropout=0.1
β Linear(512 β 15)
β CTC decode (blank=0, collapse repeats)
Output: Unicode string
```
## Field Details
- **Zones:** `num_id_zone` (13-digit CID), `num_dob_zone` (DD/MM/YYYY)
- **Charset:** `0123456789/- .` (14 chars + CTC blank)
- **Post-validation:** CID Modulo 11 checksum on digit 13
## Input Preprocessing
```python
import cv2
import numpy as np
def preprocess(img_path, height=48, max_width=320):
img = cv2.imread(img_path)
h, w = img.shape[:2]
ratio = height / h
new_w = min(int(w * ratio), max_width)
img = cv2.resize(img, (new_w, height))
# Pad to max_width with white
if new_w < max_width:
pad = np.full((height, max_width - new_w, 3), 255, dtype=np.uint8)
img = np.concatenate([img, pad], axis=1)
# Normalize to [-1, 1]
img = img.astype(np.float32) / 255.0
img = (img - 0.5) / 0.5
return np.transpose(img, (2, 0, 1)) # CHW
```
## CTC Decoding
```python
def ctc_decode(indices, vocab_chars, blank_idx=0):
chars, prev = [], -1
for idx in indices:
if idx != blank_idx and idx != prev:
if 1 <= idx <= len(vocab_chars):
chars.append(vocab_chars[idx - 1])
prev = idx
return "".join(chars)
```
## Loading the Model
```python
import torch
import torch.nn as nn
class SimpleCRNN(nn.Module):
def __init__(self, num_classes, img_h=48):
super().__init__()
self.cnn = nn.Sequential(
nn.Conv2d(3, 32, 3, padding=1), nn.BatchNorm2d(32), nn.ReLU(), nn.MaxPool2d(2, 2),
nn.Conv2d(32, 64, 3, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(2, 2),
nn.Conv2d(64, 128, 3, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(2, 2),
nn.Conv2d(128, 256, 3, padding=1), nn.BatchNorm2d(256), nn.ReLU(),
nn.AdaptiveAvgPool2d((1, None)),
)
self.rnn = nn.LSTM(256, 256, num_layers=2, bidirectional=True, batch_first=True, dropout=0.1)
self.fc = nn.Linear(512, num_classes)
def forward(self, x):
features = self.cnn(x).squeeze(2).permute(0, 2, 1)
rnn_out, _ = self.rnn(features)
return self.fc(rnn_out).permute(1, 0, 2) # (T, B, C) for CTC
model = SimpleCRNN(num_classes=15)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
```
## Pipeline Context
This model is one of 3 Reader experts in the **Thai ID Nano OCR** pipeline:
```
Camera Frame β YOLO26n Finder (5-class, single pass)
β num_id_zone, num_dob_zone β Numeric Reader
β text_eng_zone β English Reader
β text_thai_zone β Thai Reader
β Validator (Mod11 checksum, date logic)
```
Total pipeline: <15 MB, 30fps on mobile.
## Files
| File | Description |
|------|-------------|
| `model.pt` | PyTorch `state_dict` (~12 MB) |
| `vocab.txt` | Character vocabulary, one per line (`<space>` = space). CTC blank is implicit at index 0. |
| `config.json` | Architecture params, training metadata, charset |
## License
MIT
|