kiri-ocr / README.md
mrrtmob's picture
Update README.md
460df75 verified
---
language:
- en
- km
tags:
- ocr
- text-recognition
- pytorch
- transformer
- handwritten
- khmer
- multilingual
license: apache-2.0
datasets:
- mrrtmob/km_en_image_line
- mrrtmob/khmer_english_ocr_image_line
pipeline_tag: image-to-text
library_name: kiri-ocr
---
# Kiri OCR Model
**Kiri OCR** is a lightweight OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.
## ✨ Key Features
- **Lightweight**: Compact model optimized for speed and efficiency
- **Bilingual**: Native support for English and Khmer (including mixed text)
- **Document Processing**: Automatic text line and word detection
- **Hybrid Decoding**: CTC + Attention decoder with language model fusion
## πŸ—οΈ Architecture
| Component | Details |
|-----------|---------|
| **Type** | Transformer Encoder-Decoder with CTC |
| **Encoder** | 4 layers, 8 heads, 256 dim, 1024 FFN |
| **Decoder** | 3 layers, 8 heads, 256 dim, 1024 FFN |
| **CNN Backbone** | ConvStem (4 conv layers with BatchNorm + SiLU) |
| **Decoding** | Beam search with CTC fusion + LM fusion |
| **Input Size** | 48 Γ— 640 px (height Γ— width) |
| **Framework** | PyTorch |
### Model Diagram
```
Input Image (48Γ—640)
↓
ConvStem (CNN)
↓
2D Positional Encoding
↓
Transformer Encoder (4L)
↓
β”Œβ”€β”€β”€β”΄β”€β”€β”€β”
↓ ↓
CTC Head Transformer Decoder (3L)
↓ ↓
β””β”€β”€β”€β”¬β”€β”€β”€β”˜
↓
Beam Search + CTC Fusion + LM Fusion
↓
Output Text
```
## πŸ“Š Dataset
The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, containing **12 million** synthetic images of Khmer and English text lines.
## πŸ’» Usage
### Installation
```bash
pip install kiri-ocr
```
### Python API
```python
from kiri_ocr import OCR
# Initialize (downloads from Hugging Face automatically)
ocr = OCR()
# Extract text from document
text, results = ocr.extract_text("document.jpg")
print(text)
# Access detailed results
for result in results:
print(f"Text: {result.text}")
print(f"Confidence: {result.confidence:.2%}")
```
### CLI Tool
```bash
# Basic usage
kiri-ocr predict path/to/document.jpg
# With output directory
kiri-ocr predict path/to/document.jpg --output results/
```
## πŸ“ˆ Benchmarks
Results on synthetic test images (10 popular fonts):
![Benchmark Table](benchmark_table.png)
![Benchmark Graph](benchmark_graph.png)
## βš™οΈ Configuration
Default inference parameters:
| Parameter | Value | Description |
|-----------|-------|-------------|
| `beam_width` | 4 | Beam search width |
| `ctc_fusion_alpha` | 0.5 | CTC score fusion weight |
| `lm_fusion_alpha` | 0.35 | Language model fusion weight |
| `max_length` | 260 | Maximum output sequence length |
## πŸ“ Model Files
```
kiri-ocr/
β”œβ”€β”€ config.json # Model configuration
β”œβ”€β”€ vocab.json # Character vocabulary
β”œβ”€β”€ model.safetensors # Model weights
└── README.md # This file
```
## πŸ”— Links
- **GitHub**: [github.com/mrrtmob/kiri-ocr](https://github.com/mrrtmob/kiri-ocr)
- **Dataset**: [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line)
- **PyPI**: [pypi.org/project/kiri-ocr](https://pypi.org/project/kiri-ocr)
## πŸ“ Citation
```bibtex
@software{kiri_ocr,
author = {mrrtmob},
title = {Kiri OCR: Lightweight OCR for English and Khmer},
year = {2026},
url = {https://huggingface.co/mrrtmob/kiri-ocr}
}
```
## πŸ“„ License
This model is released under the [Apache 2.0 License](LICENSE).