--- library_name: pytorch license: mit language: - th - en tags: - ocr - text-recognition - thai-id-card - crnn - ctc - on-device - mobile - numeric-ocr - citizen-id pipeline_tag: image-to-text --- # Thai ID Nano OCR — Numeric OCR Reader (SimpleCRNN (MVP)) > **MVP model.** Production upgrade: swap to `ppocrv5` variant (same interface, > better accuracy). See `config.json` → `architecture_variant` for programmatic detection. CTC-based text recognition model for Thai National ID card **numeric** fields, designed for on-device inference at 30fps on mobile. | Metric | Value | |--------|-------| | Architecture | SimpleCRNN (MVP) | | Variant | `crnn` | | ExactMatch | 98.6% | | CharAccuracy | 99.4% | | Parameters | 3,026,703 | | Vocab size | 15 | | Best epoch | 10 | ## Quick Start ```python from huggingface_hub import hf_hub_download model_path = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "model.pt") vocab_path = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "vocab.txt") config = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "config.json") ``` ## Architecture **SimpleCRNN** — CNN (4-layer) + BiLSTM (2-layer) + CTC decoder. ``` Input: [B, 3, 48, 320] (RGB, normalized to [-1, 1]) → CNN: 32→64→128→256 channels, BatchNorm+ReLU, MaxPool(2,2)×3 → AdaptiveAvgPool2d((1, None)) → T=40 time steps → BiLSTM: hidden=256, layers=2, dropout=0.1 → Linear(512 → 15) → CTC decode (blank=0, collapse repeats) Output: Unicode string ``` ## Field Details - **Zones:** `num_id_zone` (13-digit CID), `num_dob_zone` (DD/MM/YYYY) - **Charset:** `0123456789/- .` (14 chars + CTC blank) - **Post-validation:** CID Modulo 11 checksum on digit 13 ## Input Preprocessing ```python import cv2 import numpy as np def preprocess(img_path, height=48, max_width=320): img = cv2.imread(img_path) h, w = img.shape[:2] ratio = height / h new_w = min(int(w * ratio), max_width) img = cv2.resize(img, (new_w, height)) # Pad to max_width with white if new_w < max_width: pad = np.full((height, max_width - new_w, 3), 255, dtype=np.uint8) img = np.concatenate([img, pad], axis=1) # Normalize to [-1, 1] img = img.astype(np.float32) / 255.0 img = (img - 0.5) / 0.5 return np.transpose(img, (2, 0, 1)) # CHW ``` ## CTC Decoding ```python def ctc_decode(indices, vocab_chars, blank_idx=0): chars, prev = [], -1 for idx in indices: if idx != blank_idx and idx != prev: if 1 <= idx <= len(vocab_chars): chars.append(vocab_chars[idx - 1]) prev = idx return "".join(chars) ``` ## Loading the Model ```python import torch import torch.nn as nn class SimpleCRNN(nn.Module): def __init__(self, num_classes, img_h=48): super().__init__() self.cnn = nn.Sequential( nn.Conv2d(3, 32, 3, padding=1), nn.BatchNorm2d(32), nn.ReLU(), nn.MaxPool2d(2, 2), nn.Conv2d(32, 64, 3, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(2, 2), nn.Conv2d(64, 128, 3, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(2, 2), nn.Conv2d(128, 256, 3, padding=1), nn.BatchNorm2d(256), nn.ReLU(), nn.AdaptiveAvgPool2d((1, None)), ) self.rnn = nn.LSTM(256, 256, num_layers=2, bidirectional=True, batch_first=True, dropout=0.1) self.fc = nn.Linear(512, num_classes) def forward(self, x): features = self.cnn(x).squeeze(2).permute(0, 2, 1) rnn_out, _ = self.rnn(features) return self.fc(rnn_out).permute(1, 0, 2) # (T, B, C) for CTC model = SimpleCRNN(num_classes=15) model.load_state_dict(torch.load(model_path, map_location="cpu")) model.eval() ``` ## Pipeline Context This model is one of 3 Reader experts in the **Thai ID Nano OCR** pipeline: ``` Camera Frame → YOLO26n Finder (5-class, single pass) → num_id_zone, num_dob_zone → Numeric Reader → text_eng_zone → English Reader → text_thai_zone → Thai Reader → Validator (Mod11 checksum, date logic) ``` Total pipeline: <15 MB, 30fps on mobile. ## Files | File | Description | |------|-------------| | `model.pt` | PyTorch `state_dict` (~12 MB) | | `vocab.txt` | Character vocabulary, one per line (`` = space). CTC blank is implicit at index 0. | | `config.json` | Architecture params, training metadata, charset | ## License MIT