File size: 3,729 Bytes

---
language:
- en
- km
tags:
- ocr
- text-recognition
- pytorch
- transformer
- handwritten
- khmer
- multilingual
license: apache-2.0
datasets:
- mrrtmob/km_en_image_line
- mrrtmob/khmer_english_ocr_image_line
pipeline_tag: image-to-text
library_name: kiri-ocr
---

# Kiri OCR Model

**Kiri OCR** is a lightweight OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.

## ✨ Key Features

- **Lightweight**: Compact model optimized for speed and efficiency
- **Bilingual**: Native support for English and Khmer (including mixed text)
- **Document Processing**: Automatic text line and word detection
- **Hybrid Decoding**: CTC + Attention decoder with language model fusion

## 🏗️ Architecture

| Component | Details |
|-----------|---------|
| **Type** | Transformer Encoder-Decoder with CTC |
| **Encoder** | 4 layers, 8 heads, 256 dim, 1024 FFN |
| **Decoder** | 3 layers, 8 heads, 256 dim, 1024 FFN |
| **CNN Backbone** | ConvStem (4 conv layers with BatchNorm + SiLU) |
| **Decoding** | Beam search with CTC fusion + LM fusion |
| **Input Size** | 48 × 640 px (height × width) |
| **Framework** | PyTorch |

### Model Diagram

```
Input Image (48×640)
       ↓
   ConvStem (CNN)
       ↓
  2D Positional Encoding
       ↓
  Transformer Encoder (4L)
       ↓
   ┌───┴───┐
   ↓       ↓
CTC Head   Transformer Decoder (3L)
   ↓       ↓
   └───┬───┘
       ↓
  Beam Search + CTC Fusion + LM Fusion
       ↓
    Output Text
```

## 📊 Dataset

The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, containing **12 million** synthetic images of Khmer and English text lines.

## 💻 Usage

### Installation

```bash
pip install kiri-ocr
```

### Python API

```python
from kiri_ocr import OCR

# Initialize (downloads from Hugging Face automatically)
ocr = OCR()

# Extract text from document
text, results = ocr.extract_text("document.jpg")
print(text)

# Access detailed results
for result in results:
    print(f"Text: {result.text}")
    print(f"Confidence: {result.confidence:.2%}")
```

### CLI Tool

```bash
# Basic usage
kiri-ocr predict path/to/document.jpg

# With output directory
kiri-ocr predict path/to/document.jpg --output results/
```

## 📈 Benchmarks

Results on synthetic test images (10 popular fonts):

![Benchmark Table](benchmark_table.png)

![Benchmark Graph](benchmark_graph.png)

## ⚙️ Configuration

Default inference parameters:

| Parameter | Value | Description |
|-----------|-------|-------------|
| `beam_width` | 4 | Beam search width |
| `ctc_fusion_alpha` | 0.5 | CTC score fusion weight |
| `lm_fusion_alpha` | 0.35 | Language model fusion weight |
| `max_length` | 260 | Maximum output sequence length |

## 📁 Model Files

```
kiri-ocr/
├── config.json          # Model configuration
├── vocab.json           # Character vocabulary
├── model.safetensors    # Model weights
└── README.md            # This file
```

## 🔗 Links

- **GitHub**: [github.com/mrrtmob/kiri-ocr](https://github.com/mrrtmob/kiri-ocr)
- **Dataset**: [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line)
- **PyPI**: [pypi.org/project/kiri-ocr](https://pypi.org/project/kiri-ocr)

## 📝 Citation

```bibtex
@software{kiri_ocr,
  author = {mrrtmob},
  title = {Kiri OCR: Lightweight OCR for English and Khmer},
  year = {2026},
  url = {https://huggingface.co/mrrtmob/kiri-ocr}
}
```

## 📄 License

This model is released under the [Apache 2.0 License](LICENSE).