--- language: - en - km tags: - ocr - text-recognition - pytorch - transformer - handwritten - khmer - multilingual license: apache-2.0 datasets: - mrrtmob/km_en_image_line - mrrtmob/khmer_english_ocr_image_line pipeline_tag: image-to-text library_name: kiri-ocr --- # Kiri OCR Model **Kiri OCR** is a lightweight OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package. ## ✨ Key Features - **Lightweight**: Compact model optimized for speed and efficiency - **Bilingual**: Native support for English and Khmer (including mixed text) - **Document Processing**: Automatic text line and word detection - **Hybrid Decoding**: CTC + Attention decoder with language model fusion ## 🏗️ Architecture | Component | Details | |-----------|---------| | **Type** | Transformer Encoder-Decoder with CTC | | **Encoder** | 4 layers, 8 heads, 256 dim, 1024 FFN | | **Decoder** | 3 layers, 8 heads, 256 dim, 1024 FFN | | **CNN Backbone** | ConvStem (4 conv layers with BatchNorm + SiLU) | | **Decoding** | Beam search with CTC fusion + LM fusion | | **Input Size** | 48 × 640 px (height × width) | | **Framework** | PyTorch | ### Model Diagram ``` Input Image (48×640) ↓ ConvStem (CNN) ↓ 2D Positional Encoding ↓ Transformer Encoder (4L) ↓ ┌───┴───┐ ↓ ↓ CTC Head Transformer Decoder (3L) ↓ ↓ └───┬───┘ ↓ Beam Search + CTC Fusion + LM Fusion ↓ Output Text ``` ## 📊 Dataset The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, containing **12 million** synthetic images of Khmer and English text lines. ## 💻 Usage ### Installation ```bash pip install kiri-ocr ``` ### Python API ```python from kiri_ocr import OCR # Initialize (downloads from Hugging Face automatically) ocr = OCR() # Extract text from document text, results = ocr.extract_text("document.jpg") print(text) # Access detailed results for result in results: print(f"Text: {result.text}") print(f"Confidence: {result.confidence:.2%}") ``` ### CLI Tool ```bash # Basic usage kiri-ocr predict path/to/document.jpg # With output directory kiri-ocr predict path/to/document.jpg --output results/ ``` ## 📈 Benchmarks Results on synthetic test images (10 popular fonts): ![Benchmark Table](benchmark_table.png) ![Benchmark Graph](benchmark_graph.png) ## ⚙️ Configuration Default inference parameters: | Parameter | Value | Description | |-----------|-------|-------------| | `beam_width` | 4 | Beam search width | | `ctc_fusion_alpha` | 0.5 | CTC score fusion weight | | `lm_fusion_alpha` | 0.35 | Language model fusion weight | | `max_length` | 260 | Maximum output sequence length | ## 📁 Model Files ``` kiri-ocr/ ├── config.json # Model configuration ├── vocab.json # Character vocabulary ├── model.safetensors # Model weights └── README.md # This file ``` ## 🔗 Links - **GitHub**: [github.com/mrrtmob/kiri-ocr](https://github.com/mrrtmob/kiri-ocr) - **Dataset**: [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) - **PyPI**: [pypi.org/project/kiri-ocr](https://pypi.org/project/kiri-ocr) ## 📝 Citation ```bibtex @software{kiri_ocr, author = {mrrtmob}, title = {Kiri OCR: Lightweight OCR for English and Khmer}, year = {2026}, url = {https://huggingface.co/mrrtmob/kiri-ocr} } ``` ## 📄 License This model is released under the [Apache 2.0 License](LICENSE).