File size: 4,477 Bytes
10316b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
library_name: pytorch
license: mit
language:
  - th
  - en
tags:
  - ocr
  - text-recognition
  - thai-id-card
  - crnn
  - ctc
  - on-device
  - mobile
  - numeric-ocr
  - citizen-id
pipeline_tag: image-to-text
---

# Thai ID Nano OCR β€” Numeric OCR Reader (SimpleCRNN (MVP))

> **MVP model.** Production upgrade: swap to `ppocrv5` variant (same interface,
> better accuracy). See `config.json` β†’ `architecture_variant` for programmatic detection.

CTC-based text recognition model for Thai National ID card **numeric** fields,
designed for on-device inference at 30fps on mobile.

| Metric | Value |
|--------|-------|
| Architecture | SimpleCRNN (MVP) |
| Variant | `crnn` |
| ExactMatch | 98.6% |
| CharAccuracy | 99.4% |
| Parameters | 3,026,703 |
| Vocab size | 15 |
| Best epoch | 10 |

## Quick Start

```python
from huggingface_hub import hf_hub_download

model_path = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "model.pt")
vocab_path = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "vocab.txt")
config = hf_hub_download("chayuto/thai-id-ocr-crnn-numeric-reader", "config.json")
```

## Architecture

**SimpleCRNN** β€” CNN (4-layer) + BiLSTM (2-layer) + CTC decoder.

```
Input: [B, 3, 48, 320]  (RGB, normalized to [-1, 1])
  β†’ CNN: 32β†’64β†’128β†’256 channels, BatchNorm+ReLU, MaxPool(2,2)Γ—3
  β†’ AdaptiveAvgPool2d((1, None))  β†’ T=40 time steps
  β†’ BiLSTM: hidden=256, layers=2, dropout=0.1
  β†’ Linear(512 β†’ 15)
  β†’ CTC decode (blank=0, collapse repeats)
Output: Unicode string
```

## Field Details

- **Zones:** `num_id_zone` (13-digit CID), `num_dob_zone` (DD/MM/YYYY)
- **Charset:** `0123456789/- .` (14 chars + CTC blank)
- **Post-validation:** CID Modulo 11 checksum on digit 13

## Input Preprocessing

```python
import cv2
import numpy as np

def preprocess(img_path, height=48, max_width=320):
    img = cv2.imread(img_path)
    h, w = img.shape[:2]
    ratio = height / h
    new_w = min(int(w * ratio), max_width)
    img = cv2.resize(img, (new_w, height))
    # Pad to max_width with white
    if new_w < max_width:
        pad = np.full((height, max_width - new_w, 3), 255, dtype=np.uint8)
        img = np.concatenate([img, pad], axis=1)
    # Normalize to [-1, 1]
    img = img.astype(np.float32) / 255.0
    img = (img - 0.5) / 0.5
    return np.transpose(img, (2, 0, 1))  # CHW
```

## CTC Decoding

```python
def ctc_decode(indices, vocab_chars, blank_idx=0):
    chars, prev = [], -1
    for idx in indices:
        if idx != blank_idx and idx != prev:
            if 1 <= idx <= len(vocab_chars):
                chars.append(vocab_chars[idx - 1])
        prev = idx
    return "".join(chars)
```

## Loading the Model

```python
import torch
import torch.nn as nn

class SimpleCRNN(nn.Module):
    def __init__(self, num_classes, img_h=48):
        super().__init__()
        self.cnn = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1), nn.BatchNorm2d(32), nn.ReLU(), nn.MaxPool2d(2, 2),
            nn.Conv2d(32, 64, 3, padding=1), nn.BatchNorm2d(64), nn.ReLU(), nn.MaxPool2d(2, 2),
            nn.Conv2d(64, 128, 3, padding=1), nn.BatchNorm2d(128), nn.ReLU(), nn.MaxPool2d(2, 2),
            nn.Conv2d(128, 256, 3, padding=1), nn.BatchNorm2d(256), nn.ReLU(),
            nn.AdaptiveAvgPool2d((1, None)),
        )
        self.rnn = nn.LSTM(256, 256, num_layers=2, bidirectional=True, batch_first=True, dropout=0.1)
        self.fc = nn.Linear(512, num_classes)

    def forward(self, x):
        features = self.cnn(x).squeeze(2).permute(0, 2, 1)
        rnn_out, _ = self.rnn(features)
        return self.fc(rnn_out).permute(1, 0, 2)  # (T, B, C) for CTC

model = SimpleCRNN(num_classes=15)
model.load_state_dict(torch.load(model_path, map_location="cpu"))
model.eval()
```

## Pipeline Context

This model is one of 3 Reader experts in the **Thai ID Nano OCR** pipeline:

```
Camera Frame β†’ YOLO26n Finder (5-class, single pass)
  β†’ num_id_zone, num_dob_zone    β†’ Numeric Reader
  β†’ text_eng_zone                β†’ English Reader
  β†’ text_thai_zone               β†’ Thai Reader
  β†’ Validator (Mod11 checksum, date logic)
```

Total pipeline: <15 MB, 30fps on mobile.

## Files

| File | Description |
|------|-------------|
| `model.pt` | PyTorch `state_dict` (~12 MB) |
| `vocab.txt` | Character vocabulary, one per line (`<space>` = space). CTC blank is implicit at index 0. |
| `config.json` | Architecture params, training metadata, charset |

## License

MIT