| --- |
| library_name: pytorch |
| license: mit |
| language: |
| - th |
| - en |
| tags: |
| - ocr |
| - text-recognition |
| - thai-id-card |
| - crnn |
| - ctc |
| - on-device |
| - mobile |
| - numeric-ocr |
| - citizen-id |
| pipeline_tag: image-to-text |
| --- |
| |
| # Thai ID Nano OCR β Numeric OCR Reader (SimpleCRNN (MVP)) |
|
|
| > **MVP model.** Production upgrade: swap to `ppocrv5` variant (same interface, |
| > better accuracy). See `config.json` β `architecture_variant` for programmatic detection. |
| |
| CTC-based text recognition model for Thai National ID card **numeric** fields, |
| designed for on-device inference at 30fps on mobile. |
| |
| | Metric | Value | |
| |--------|-------| |
| | Architecture | SimpleCRNN (MVP) | |
| | Variant | `crnn` | |
| | ExactMatch | 98.6% | |
| | CharAccuracy | 99.4% | |
| | Parameters | 3,026,703 | |
| | Vocab size | 15 | |
| | Best epoch | 10 | |
| |
| ## Quick Start |
| |
| ```python |
| from huggingface_hub import hf_hub_download |
|
|
| model_path = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "model.pt") |
| vocab_path = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "vocab.txt") |
| config = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "config.json") |
| ``` |
| |
| ## Architecture |
| |
| **SimpleCRNN** β CNN (4-layer) + BiLSTM (2-layer) + CTC decoder. |
| |
| ``` |
| Input: [B, 3, 48, 320] (RGB, normalized to [-1, 1]) |
| β CNN: 32β64β128β256 channels, BatchNorm+ReLU, MaxPool(2,2)Γ3 |
| β AdaptiveAvgPool2d((1, None)) β T=40 time steps |
| β BiLSTM: hidden=256, layers=2, dropout=0.1 |
| β Linear(512 β 15) |
| β CTC decode (blank=0, collapse repeats) |
| Output: Unicode string |
| ``` |
| |
| ## Field Details |
| |
| - **Zones:** `num_id_zone` (13-digit CID), `num_dob_zone` (DD/MM/YYYY) |
| - **Charset:** `0123456789/- .` (14 chars + CTC blank) |
| - **Post-validation:** CID Modulo 11 checksum on digit 13 |
| |
| ## Input Preprocessing |
| |
| ```python |
| import cv2 |
| import numpy as np |
|
|
| def preprocess(img_path, height=48, max_width=320): |
| img = cv2.imread(img_path) |
| h, w = img.shape[:2] |
| ratio = height / h |
| new_w = min(int(w * ratio), max_width) |
| img = cv2.resize(img, (new_w, height)) |
| # Pad to max_width with white |
| if new_w < max_width: |
| pad = np.full((height, max_width - new_w, 3), 255, dtype=np.uint8) |
| img = np.concatenate([img, pad], axis=1) |
| # Normalize to [-1, 1] |
| img = img.astype(np.float32) / 255.0 |
| img = (img - 0.5) / 0.5 |
| return np.transpose(img, (2, 0, 1)) # CHW |
| ``` |
| |
| ## CTC Decoding |
|
|
| ```python |
| def ctc_decode(indices, vocab_chars, blank_idx=0): |
| chars, prev = [], -1 |
| for idx in indices: |
| if idx != blank_idx and idx != prev: |
| if 1 <= idx <= len(vocab_chars): |
| chars.append(vocab_chars[idx - 1]) |
| prev = idx |
| return "".join(chars) |
| ``` |
|
|
| ## Loading the Model |
|
|
| ```python |
| import torch |
| import torch.nn as nn |
| |
| class SimpleCRNN(nn.Module): |
| def __init__(self, num_classes, img_h=48): |
| super().__init__() |
| self.cnn = nn.Sequential( |
| nn.Conv2d(3, 32, 3, padding=1), nn.BatchNorm2d(32), nn.ReLU(), nn.MaxPool2d(2, 2), |
| nn.Conv2d(32, 64, 3, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(2, 2), |
| nn.Conv2d(64, 128, 3, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(2, 2), |
| nn.Conv2d(128, 256, 3, padding=1), nn.BatchNorm2d(256), nn.ReLU(), |
| nn.AdaptiveAvgPool2d((1, None)), |
| ) |
| self.rnn = nn.LSTM(256, 256, num_layers=2, bidirectional=True, batch_first=True, dropout=0.1) |
| self.fc = nn.Linear(512, num_classes) |
| |
| def forward(self, x): |
| features = self.cnn(x).squeeze(2).permute(0, 2, 1) |
| rnn_out, _ = self.rnn(features) |
| return self.fc(rnn_out).permute(1, 0, 2) # (T, B, C) for CTC |
| |
| model = SimpleCRNN(num_classes=15) |
| model.load_state_dict(torch.load(model_path, map_location="cpu")) |
| model.eval() |
| ``` |
|
|
| ## Pipeline Context |
|
|
| This model is one of 3 Reader experts in the **Thai ID Nano OCR** pipeline: |
|
|
| ``` |
| Camera Frame β YOLO26n Finder (5-class, single pass) |
| β num_id_zone, num_dob_zone β Numeric Reader |
| β text_eng_zone β English Reader |
| β text_thai_zone β Thai Reader |
| β Validator (Mod11 checksum, date logic) |
| ``` |
|
|
| Total pipeline: <15 MB, 30fps on mobile. |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `model.pt` | PyTorch `state_dict` (~12 MB) | |
| | `vocab.txt` | Character vocabulary, one per line (`<space>` = space). CTC blank is implicit at index 0. | |
| | `config.json` | Architecture params, training metadata, charset | |
|
|
| ## License |
|
|
| MIT |
|
|