|
|
--- |
|
|
language: |
|
|
- en |
|
|
- km |
|
|
tags: |
|
|
- ocr |
|
|
- text-recognition |
|
|
- pytorch |
|
|
- transformer |
|
|
- handwritten |
|
|
- khmer |
|
|
- multilingual |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- mrrtmob/km_en_image_line |
|
|
- mrrtmob/khmer_english_ocr_image_line |
|
|
pipeline_tag: image-to-text |
|
|
library_name: kiri-ocr |
|
|
--- |
|
|
|
|
|
# Kiri OCR Model |
|
|
|
|
|
**Kiri OCR** is a lightweight OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package. |
|
|
|
|
|
## β¨ Key Features |
|
|
|
|
|
- **Lightweight**: Compact model optimized for speed and efficiency |
|
|
- **Bilingual**: Native support for English and Khmer (including mixed text) |
|
|
- **Document Processing**: Automatic text line and word detection |
|
|
- **Hybrid Decoding**: CTC + Attention decoder with language model fusion |
|
|
|
|
|
## ποΈ Architecture |
|
|
|
|
|
| Component | Details | |
|
|
|-----------|---------| |
|
|
| **Type** | Transformer Encoder-Decoder with CTC | |
|
|
| **Encoder** | 4 layers, 8 heads, 256 dim, 1024 FFN | |
|
|
| **Decoder** | 3 layers, 8 heads, 256 dim, 1024 FFN | |
|
|
| **CNN Backbone** | ConvStem (4 conv layers with BatchNorm + SiLU) | |
|
|
| **Decoding** | Beam search with CTC fusion + LM fusion | |
|
|
| **Input Size** | 48 Γ 640 px (height Γ width) | |
|
|
| **Framework** | PyTorch | |
|
|
|
|
|
### Model Diagram |
|
|
|
|
|
``` |
|
|
Input Image (48Γ640) |
|
|
β |
|
|
ConvStem (CNN) |
|
|
β |
|
|
2D Positional Encoding |
|
|
β |
|
|
Transformer Encoder (4L) |
|
|
β |
|
|
βββββ΄ββββ |
|
|
β β |
|
|
CTC Head Transformer Decoder (3L) |
|
|
β β |
|
|
βββββ¬ββββ |
|
|
β |
|
|
Beam Search + CTC Fusion + LM Fusion |
|
|
β |
|
|
Output Text |
|
|
``` |
|
|
|
|
|
## π Dataset |
|
|
|
|
|
The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, containing **12 million** synthetic images of Khmer and English text lines. |
|
|
|
|
|
## π» Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install kiri-ocr |
|
|
``` |
|
|
|
|
|
### Python API |
|
|
|
|
|
```python |
|
|
from kiri_ocr import OCR |
|
|
|
|
|
# Initialize (downloads from Hugging Face automatically) |
|
|
ocr = OCR() |
|
|
|
|
|
# Extract text from document |
|
|
text, results = ocr.extract_text("document.jpg") |
|
|
print(text) |
|
|
|
|
|
# Access detailed results |
|
|
for result in results: |
|
|
print(f"Text: {result.text}") |
|
|
print(f"Confidence: {result.confidence:.2%}") |
|
|
``` |
|
|
|
|
|
### CLI Tool |
|
|
|
|
|
```bash |
|
|
# Basic usage |
|
|
kiri-ocr predict path/to/document.jpg |
|
|
|
|
|
# With output directory |
|
|
kiri-ocr predict path/to/document.jpg --output results/ |
|
|
``` |
|
|
|
|
|
## π Benchmarks |
|
|
|
|
|
Results on synthetic test images (10 popular fonts): |
|
|
|
|
|
 |
|
|
|
|
|
 |
|
|
|
|
|
## βοΈ Configuration |
|
|
|
|
|
Default inference parameters: |
|
|
|
|
|
| Parameter | Value | Description | |
|
|
|-----------|-------|-------------| |
|
|
| `beam_width` | 4 | Beam search width | |
|
|
| `ctc_fusion_alpha` | 0.5 | CTC score fusion weight | |
|
|
| `lm_fusion_alpha` | 0.35 | Language model fusion weight | |
|
|
| `max_length` | 260 | Maximum output sequence length | |
|
|
|
|
|
## π Model Files |
|
|
|
|
|
``` |
|
|
kiri-ocr/ |
|
|
βββ config.json # Model configuration |
|
|
βββ vocab.json # Character vocabulary |
|
|
βββ model.safetensors # Model weights |
|
|
βββ README.md # This file |
|
|
``` |
|
|
|
|
|
## π Links |
|
|
|
|
|
- **GitHub**: [github.com/mrrtmob/kiri-ocr](https://github.com/mrrtmob/kiri-ocr) |
|
|
- **Dataset**: [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) |
|
|
- **PyPI**: [pypi.org/project/kiri-ocr](https://pypi.org/project/kiri-ocr) |
|
|
|
|
|
## π Citation |
|
|
|
|
|
```bibtex |
|
|
@software{kiri_ocr, |
|
|
author = {mrrtmob}, |
|
|
title = {Kiri OCR: Lightweight OCR for English and Khmer}, |
|
|
year = {2026}, |
|
|
url = {https://huggingface.co/mrrtmob/kiri-ocr} |
|
|
} |
|
|
``` |
|
|
|
|
|
## π License |
|
|
|
|
|
This model is released under the [Apache 2.0 License](LICENSE). |