File size: 3,729 Bytes
6528baf
 
 
92d8c96
6528baf
 
56c5d62
6528baf
56c5d62
6528baf
56c5d62
 
58eb328
52cfe14
 
56c5d62
 
 
6528baf
 
 
 
56c5d62
6528baf
58eb328
 
56c5d62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58eb328
 
 
56c5d62
58eb328
 
 
 
 
 
 
 
 
 
6528baf
 
58eb328
6528baf
56c5d62
58eb328
6528baf
56c5d62
 
6528baf
56c5d62
 
 
 
 
6528baf
 
58eb328
 
 
56c5d62
 
 
 
58eb328
 
 
 
 
 
52cfe14
56c5d62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58eb328
65ee99f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
language:
- en
- km
tags:
- ocr
- text-recognition
- pytorch
- transformer
- handwritten
- khmer
- multilingual
license: apache-2.0
datasets:
- mrrtmob/km_en_image_line
- mrrtmob/khmer_english_ocr_image_line
pipeline_tag: image-to-text
library_name: kiri-ocr
---

# Kiri OCR Model

**Kiri OCR** is a lightweight OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.

## ✨ Key Features

- **Lightweight**: Compact model optimized for speed and efficiency
- **Bilingual**: Native support for English and Khmer (including mixed text)
- **Document Processing**: Automatic text line and word detection
- **Hybrid Decoding**: CTC + Attention decoder with language model fusion

## πŸ—οΈ Architecture

| Component | Details |
|-----------|---------|
| **Type** | Transformer Encoder-Decoder with CTC |
| **Encoder** | 4 layers, 8 heads, 256 dim, 1024 FFN |
| **Decoder** | 3 layers, 8 heads, 256 dim, 1024 FFN |
| **CNN Backbone** | ConvStem (4 conv layers with BatchNorm + SiLU) |
| **Decoding** | Beam search with CTC fusion + LM fusion |
| **Input Size** | 48 Γ— 640 px (height Γ— width) |
| **Framework** | PyTorch |

### Model Diagram

```
Input Image (48Γ—640)
       ↓
   ConvStem (CNN)
       ↓
  2D Positional Encoding
       ↓
  Transformer Encoder (4L)
       ↓
   β”Œβ”€β”€β”€β”΄β”€β”€β”€β”
   ↓       ↓
CTC Head   Transformer Decoder (3L)
   ↓       ↓
   β””β”€β”€β”€β”¬β”€β”€β”€β”˜
       ↓
  Beam Search + CTC Fusion + LM Fusion
       ↓
    Output Text
```

## πŸ“Š Dataset

The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, containing **12 million** synthetic images of Khmer and English text lines.

## πŸ’» Usage

### Installation

```bash
pip install kiri-ocr
```

### Python API

```python
from kiri_ocr import OCR

# Initialize (downloads from Hugging Face automatically)
ocr = OCR()

# Extract text from document
text, results = ocr.extract_text("document.jpg")
print(text)

# Access detailed results
for result in results:
    print(f"Text: {result.text}")
    print(f"Confidence: {result.confidence:.2%}")
```

### CLI Tool

```bash
# Basic usage
kiri-ocr predict path/to/document.jpg

# With output directory
kiri-ocr predict path/to/document.jpg --output results/
```

## πŸ“ˆ Benchmarks

Results on synthetic test images (10 popular fonts):

![Benchmark Table](benchmark_table.png)

![Benchmark Graph](benchmark_graph.png)

## βš™οΈ Configuration

Default inference parameters:

| Parameter | Value | Description |
|-----------|-------|-------------|
| `beam_width` | 4 | Beam search width |
| `ctc_fusion_alpha` | 0.5 | CTC score fusion weight |
| `lm_fusion_alpha` | 0.35 | Language model fusion weight |
| `max_length` | 260 | Maximum output sequence length |

## πŸ“ Model Files

```
kiri-ocr/
β”œβ”€β”€ config.json          # Model configuration
β”œβ”€β”€ vocab.json           # Character vocabulary
β”œβ”€β”€ model.safetensors    # Model weights
└── README.md            # This file
```

## πŸ”— Links

- **GitHub**: [github.com/mrrtmob/kiri-ocr](https://github.com/mrrtmob/kiri-ocr)
- **Dataset**: [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line)
- **PyPI**: [pypi.org/project/kiri-ocr](https://pypi.org/project/kiri-ocr)

## πŸ“ Citation

```bibtex
@software{kiri_ocr,
  author = {mrrtmob},
  title = {Kiri OCR: Lightweight OCR for English and Khmer},
  year = {2026},
  url = {https://huggingface.co/mrrtmob/kiri-ocr}
}
```

## πŸ“„ License

This model is released under the [Apache 2.0 License](LICENSE).