mrrtmob
/

kiri-ocr

@@ -4,27 +4,68 @@ language:
 - km
 tags:
 - ocr
 - pytorch
 - handwritten
 license: apache-2.0
 datasets:
 - mrrtmob/km_en_image_line
-- mrrtmob/khmer_english_ocr_image_line
 ---
 # Kiri OCR Model
-**Kiri OCR** is a lightweight, OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.
 ## ✨ Key Features
-- **Lightweight**: Compact model optimized for speed and efficiency.
-- **Bi-lingual**: Native support for English and Khmer (and mixed).
-- **Document Processing**: Automatic text line and word detection.
 ## 📊 Dataset
-The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, which contains **12 million** synthetic images of Khmer and English text lines.
 ## 💻 Usage
@@ -39,29 +80,76 @@ pip install kiri-ocr
 ```python
 from kiri_ocr import OCR
-# Initialize (loads from Hugging Face automatically)
 ocr = OCR()
-# Extract text
-text, results = ocr.extract_text('document.jpg')
 print(text)
 ```
 ### CLI Tool
 ```bash
 kiri-ocr predict path/to/document.jpg --output results/
 ```
-## Model Details
-- **Architecture**: CRNN (CNN + LSTM + CTC)
-- **Framework**: PyTorch
-- **Input Size**: Height 32px (width variable)
 ## 📈 Benchmarks
 Results on synthetic test images (10 popular fonts):
-![benchmark_table.png](benchmark_table.png)
-![benchmark_graph.png](benchmark_graph.png)

 - km
 tags:
 - ocr
+- text-recognition
 - pytorch
+- transformer
 - handwritten
+- khmer
+- multilingual
 license: apache-2.0
 datasets:
 - mrrtmob/km_en_image_line
+- mrrtmob/khmer_english_ocr_image_line
+pipeline_tag: image-to-text
+library_name: kiri-ocr
 ---
 # Kiri OCR Model
+**Kiri OCR** is a lightweight OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.
 ## ✨ Key Features
+- **Lightweight**: Compact model optimized for speed and efficiency
+- **Bilingual**: Native support for English and Khmer (including mixed text)
+- **Document Processing**: Automatic text line and word detection
+- **Hybrid Decoding**: CTC + Attention decoder with language model fusion
+## 🏗️ Architecture
+| Component | Details |
+|-----------|---------|
+| **Type** | Transformer Encoder-Decoder with CTC |
+| **Encoder** | 4 layers, 8 heads, 256 dim, 1024 FFN |
+| **Decoder** | 3 layers, 8 heads, 256 dim, 1024 FFN |
+| **CNN Backbone** | ConvStem (4 conv layers with BatchNorm + SiLU) |
+| **Decoding** | Beam search with CTC fusion + LM fusion |
+| **Input Size** | 48 × 640 px (height × width) |
+| **Framework** | PyTorch |
+### Model Diagram
+```
+Input Image (48×640)
+       ↓
+   ConvStem (CNN)
+       ↓
+  2D Positional Encoding
+       ↓
+  Transformer Encoder (4L)
+       ↓
+   ┌───┴───┐
+   ↓       ↓
+CTC Head   Transformer Decoder (3L)
+   ↓       ↓
+   └───┬───┘
+       ↓
+  Beam Search + CTC Fusion + LM Fusion
+       ↓
+    Output Text
+```
 ## 📊 Dataset
+The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, containing **12 million** synthetic images of Khmer and English text lines.
 ## 💻 Usage
 ```python
 from kiri_ocr import OCR
+# Initialize (downloads from Hugging Face automatically)
 ocr = OCR()
+# Extract text from document
+text, results = ocr.extract_text("document.jpg")
 print(text)
+# Access detailed results
+for result in results:
+    print(f"Text: {result.text}")
+    print(f"Confidence: {result.confidence:.2%}")
 ```
 ### CLI Tool
 ```bash
+# Basic usage
+kiri-ocr predict path/to/document.jpg
+# With output directory
 kiri-ocr predict path/to/document.jpg --output results/
 ```
 ## 📈 Benchmarks
 Results on synthetic test images (10 popular fonts):
+![Benchmark Table](benchmark_table.png)
+![Benchmark Graph](benchmark_graph.png)
+## ⚙️ Configuration
+Default inference parameters:
+| Parameter | Value | Description |
+|-----------|-------|-------------|
+| `beam_width` | 4 | Beam search width |
+| `ctc_fusion_alpha` | 0.5 | CTC score fusion weight |
+| `lm_fusion_alpha` | 0.35 | Language model fusion weight |
+| `max_length` | 260 | Maximum output sequence length |
+## 📁 Model Files
+```
+kiri-ocr/
+├── config.json          # Model configuration
+├── vocab.json           # Character vocabulary
+├── model.safetensors    # Model weights
+└── README.md            # This file
+```
+## 🔗 Links
+- **GitHub**: [github.com/mrrtmob/kiri-ocr](https://github.com/mrrtmob/kiri-ocr)
+- **Dataset**: [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line)
+- **PyPI**: [pypi.org/project/kiri-ocr](https://pypi.org/project/kiri-ocr)
+## 📝 Citation
+```bibtex
+@software{kiri_ocr,
+  author = {mrrtmob},
+  title = {Kiri OCR: Lightweight OCR for English and Khmer},
+  year = {2026},
+  url = {https://huggingface.co/mrrtmob/kiri-ocr}
+}
+```
+## 📄 License
+This model is released under the [Apache 2.0 License](LICENSE).
+| Formatting | Inconsistent | Consistent tables and code blocks |