kiri-ocr / README.md

Update README.md

460df75 verified 4 days ago

3.73 kB

	---
	language:
	- en
	- km
	tags:
	- ocr
	- text-recognition
	- pytorch
	- transformer
	- handwritten
	- khmer
	- multilingual
	license: apache-2.0
	datasets:
	- mrrtmob/km_en_image_line
	- mrrtmob/khmer_english_ocr_image_line
	pipeline_tag: image-to-text
	library_name: kiri-ocr
	---

	# Kiri OCR Model

	Kiri OCR is a lightweight OCR library for English and Khmer documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.

	## ✨ Key Features

	- Lightweight: Compact model optimized for speed and efficiency
	- Bilingual: Native support for English and Khmer (including mixed text)
	- Document Processing: Automatic text line and word detection
	- Hybrid Decoding: CTC + Attention decoder with language model fusion

	## 🏗️ Architecture

	\| Component \| Details \|
	\|-----------\|---------\|
	\| Type \| Transformer Encoder-Decoder with CTC \|
	\| Encoder \| 4 layers, 8 heads, 256 dim, 1024 FFN \|
	\| Decoder \| 3 layers, 8 heads, 256 dim, 1024 FFN \|
	\| CNN Backbone \| ConvStem (4 conv layers with BatchNorm + SiLU) \|
	\| Decoding \| Beam search with CTC fusion + LM fusion \|
	\| Input Size \| 48 × 640 px (height × width) \|
	\| Framework \| PyTorch \|

	### Model Diagram

	```
	Input Image (48×640)
	↓
	ConvStem (CNN)
	↓
	2D Positional Encoding
	↓
	Transformer Encoder (4L)
	↓
	┌───┴───┐
	↓ ↓
	CTC Head Transformer Decoder (3L)
	↓ ↓
	└───┬───┘
	↓
	Beam Search + CTC Fusion + LM Fusion
	↓
	Output Text
	```

	## 📊 Dataset

	The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, containing 12 million synthetic images of Khmer and English text lines.

	## 💻 Usage

	### Installation

	```bash
	pip install kiri-ocr
	```

	### Python API

	```python
	from kiri_ocr import OCR

	# Initialize (downloads from Hugging Face automatically)
	ocr = OCR()

	# Extract text from document
	text, results = ocr.extract_text("document.jpg")
	print(text)

	# Access detailed results
	for result in results:
	print(f"Text: {result.text}")
	print(f"Confidence: {result.confidence:.2%}")
	```

	### CLI Tool

	```bash
	# Basic usage
	kiri-ocr predict path/to/document.jpg

	# With output directory
	kiri-ocr predict path/to/document.jpg --output results/
	```

	## 📈 Benchmarks

	Results on synthetic test images (10 popular fonts):

	![Benchmark Table](benchmark_table.png)

	![Benchmark Graph](benchmark_graph.png)

	## ⚙️ Configuration

	Default inference parameters:

	\| Parameter \| Value \| Description \|
	\|-----------\|-------\|-------------\|
	\| `beam_width` \| 4 \| Beam search width \|
	\| `ctc_fusion_alpha` \| 0.5 \| CTC score fusion weight \|
	\| `lm_fusion_alpha` \| 0.35 \| Language model fusion weight \|
	\| `max_length` \| 260 \| Maximum output sequence length \|

	## 📁 Model Files

	```
	kiri-ocr/
	├── config.json # Model configuration
	├── vocab.json # Character vocabulary
	├── model.safetensors # Model weights
	└── README.md # This file
	```

	## 🔗 Links

	- GitHub: [github.com/mrrtmob/kiri-ocr](https://github.com/mrrtmob/kiri-ocr)
	- Dataset: [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line)
	- PyPI: [pypi.org/project/kiri-ocr](https://pypi.org/project/kiri-ocr)

	## 📝 Citation

	```bibtex
	@software{kiri_ocr,
	author = {mrrtmob},
	title = {Kiri OCR: Lightweight OCR for English and Khmer},
	year = {2026},
	url = {https://huggingface.co/mrrtmob/kiri-ocr}
	}
	```

	## 📄 License

	This model is released under the [Apache 2.0 License](LICENSE).