File size: 3,729 Bytes
6528baf 92d8c96 6528baf 56c5d62 6528baf 56c5d62 6528baf 56c5d62 58eb328 52cfe14 56c5d62 6528baf 56c5d62 6528baf 58eb328 56c5d62 58eb328 56c5d62 58eb328 6528baf 58eb328 6528baf 56c5d62 58eb328 6528baf 56c5d62 6528baf 56c5d62 6528baf 58eb328 56c5d62 58eb328 52cfe14 56c5d62 58eb328 65ee99f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
language:
- en
- km
tags:
- ocr
- text-recognition
- pytorch
- transformer
- handwritten
- khmer
- multilingual
license: apache-2.0
datasets:
- mrrtmob/km_en_image_line
- mrrtmob/khmer_english_ocr_image_line
pipeline_tag: image-to-text
library_name: kiri-ocr
---
# Kiri OCR Model
**Kiri OCR** is a lightweight OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.
## β¨ Key Features
- **Lightweight**: Compact model optimized for speed and efficiency
- **Bilingual**: Native support for English and Khmer (including mixed text)
- **Document Processing**: Automatic text line and word detection
- **Hybrid Decoding**: CTC + Attention decoder with language model fusion
## ποΈ Architecture
| Component | Details |
|-----------|---------|
| **Type** | Transformer Encoder-Decoder with CTC |
| **Encoder** | 4 layers, 8 heads, 256 dim, 1024 FFN |
| **Decoder** | 3 layers, 8 heads, 256 dim, 1024 FFN |
| **CNN Backbone** | ConvStem (4 conv layers with BatchNorm + SiLU) |
| **Decoding** | Beam search with CTC fusion + LM fusion |
| **Input Size** | 48 Γ 640 px (height Γ width) |
| **Framework** | PyTorch |
### Model Diagram
```
Input Image (48Γ640)
β
ConvStem (CNN)
β
2D Positional Encoding
β
Transformer Encoder (4L)
β
βββββ΄ββββ
β β
CTC Head Transformer Decoder (3L)
β β
βββββ¬ββββ
β
Beam Search + CTC Fusion + LM Fusion
β
Output Text
```
## π Dataset
The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, containing **12 million** synthetic images of Khmer and English text lines.
## π» Usage
### Installation
```bash
pip install kiri-ocr
```
### Python API
```python
from kiri_ocr import OCR
# Initialize (downloads from Hugging Face automatically)
ocr = OCR()
# Extract text from document
text, results = ocr.extract_text("document.jpg")
print(text)
# Access detailed results
for result in results:
print(f"Text: {result.text}")
print(f"Confidence: {result.confidence:.2%}")
```
### CLI Tool
```bash
# Basic usage
kiri-ocr predict path/to/document.jpg
# With output directory
kiri-ocr predict path/to/document.jpg --output results/
```
## π Benchmarks
Results on synthetic test images (10 popular fonts):


## βοΈ Configuration
Default inference parameters:
| Parameter | Value | Description |
|-----------|-------|-------------|
| `beam_width` | 4 | Beam search width |
| `ctc_fusion_alpha` | 0.5 | CTC score fusion weight |
| `lm_fusion_alpha` | 0.35 | Language model fusion weight |
| `max_length` | 260 | Maximum output sequence length |
## π Model Files
```
kiri-ocr/
βββ config.json # Model configuration
βββ vocab.json # Character vocabulary
βββ model.safetensors # Model weights
βββ README.md # This file
```
## π Links
- **GitHub**: [github.com/mrrtmob/kiri-ocr](https://github.com/mrrtmob/kiri-ocr)
- **Dataset**: [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line)
- **PyPI**: [pypi.org/project/kiri-ocr](https://pypi.org/project/kiri-ocr)
## π Citation
```bibtex
@software{kiri_ocr,
author = {mrrtmob},
title = {Kiri OCR: Lightweight OCR for English and Khmer},
year = {2026},
url = {https://huggingface.co/mrrtmob/kiri-ocr}
}
```
## π License
This model is released under the [Apache 2.0 License](LICENSE). |