mrrtmob commited on
Commit
56c5d62
Β·
verified Β·
1 Parent(s): 85c3e5c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -16
README.md CHANGED
@@ -4,27 +4,68 @@ language:
4
  - km
5
  tags:
6
  - ocr
 
7
  - pytorch
 
8
  - handwritten
 
 
9
  license: apache-2.0
10
  datasets:
11
  - mrrtmob/km_en_image_line
12
- - mrrtmob/khmer_english_ocr_image_line
 
 
13
  ---
14
 
15
  # Kiri OCR Model
16
 
17
- **Kiri OCR** is a lightweight, OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.
18
 
19
  ## ✨ Key Features
20
 
21
- - **Lightweight**: Compact model optimized for speed and efficiency.
22
- - **Bi-lingual**: Native support for English and Khmer (and mixed).
23
- - **Document Processing**: Automatic text line and word detection.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## πŸ“Š Dataset
26
 
27
- The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, which contains **12 million** synthetic images of Khmer and English text lines.
28
 
29
  ## πŸ’» Usage
30
 
@@ -39,29 +80,76 @@ pip install kiri-ocr
39
  ```python
40
  from kiri_ocr import OCR
41
 
42
- # Initialize (loads from Hugging Face automatically)
43
  ocr = OCR()
44
 
45
- # Extract text
46
- text, results = ocr.extract_text('document.jpg')
47
  print(text)
 
 
 
 
 
48
  ```
49
 
50
  ### CLI Tool
51
 
52
  ```bash
 
 
 
 
53
  kiri-ocr predict path/to/document.jpg --output results/
54
  ```
55
 
56
- ## Model Details
57
- - **Architecture**: CRNN (CNN + LSTM + CTC)
58
- - **Framework**: PyTorch
59
- - **Input Size**: Height 32px (width variable)
60
-
61
  ## πŸ“ˆ Benchmarks
62
 
63
  Results on synthetic test images (10 popular fonts):
64
 
65
- ![benchmark_table.png](benchmark_table.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
- ![benchmark_graph.png](benchmark_graph.png)
 
 
4
  - km
5
  tags:
6
  - ocr
7
+ - text-recognition
8
  - pytorch
9
+ - transformer
10
  - handwritten
11
+ - khmer
12
+ - multilingual
13
  license: apache-2.0
14
  datasets:
15
  - mrrtmob/km_en_image_line
16
+ - mrrtmob/khmer_english_ocr_image_line
17
+ pipeline_tag: image-to-text
18
+ library_name: kiri-ocr
19
  ---
20
 
21
  # Kiri OCR Model
22
 
23
+ **Kiri OCR** is a lightweight OCR library for **English and Khmer** documents. It provides document-level text detection, recognition, and rendering capabilities in a compact package.
24
 
25
  ## ✨ Key Features
26
 
27
+ - **Lightweight**: Compact model optimized for speed and efficiency
28
+ - **Bilingual**: Native support for English and Khmer (including mixed text)
29
+ - **Document Processing**: Automatic text line and word detection
30
+ - **Hybrid Decoding**: CTC + Attention decoder with language model fusion
31
+
32
+ ## πŸ—οΈ Architecture
33
+
34
+ | Component | Details |
35
+ |-----------|---------|
36
+ | **Type** | Transformer Encoder-Decoder with CTC |
37
+ | **Encoder** | 4 layers, 8 heads, 256 dim, 1024 FFN |
38
+ | **Decoder** | 3 layers, 8 heads, 256 dim, 1024 FFN |
39
+ | **CNN Backbone** | ConvStem (4 conv layers with BatchNorm + SiLU) |
40
+ | **Decoding** | Beam search with CTC fusion + LM fusion |
41
+ | **Input Size** | 48 Γ— 640 px (height Γ— width) |
42
+ | **Framework** | PyTorch |
43
+
44
+ ### Model Diagram
45
+
46
+ ```
47
+ Input Image (48Γ—640)
48
+ ↓
49
+ ConvStem (CNN)
50
+ ↓
51
+ 2D Positional Encoding
52
+ ↓
53
+ Transformer Encoder (4L)
54
+ ↓
55
+ β”Œβ”€β”€β”€β”΄β”€β”€β”€β”
56
+ ↓ ↓
57
+ CTC Head Transformer Decoder (3L)
58
+ ↓ ↓
59
+ β””β”€β”€β”€β”¬β”€β”€β”€β”˜
60
+ ↓
61
+ Beam Search + CTC Fusion + LM Fusion
62
+ ↓
63
+ Output Text
64
+ ```
65
 
66
  ## πŸ“Š Dataset
67
 
68
+ The model is trained on the [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line) dataset, containing **12 million** synthetic images of Khmer and English text lines.
69
 
70
  ## πŸ’» Usage
71
 
 
80
  ```python
81
  from kiri_ocr import OCR
82
 
83
+ # Initialize (downloads from Hugging Face automatically)
84
  ocr = OCR()
85
 
86
+ # Extract text from document
87
+ text, results = ocr.extract_text("document.jpg")
88
  print(text)
89
+
90
+ # Access detailed results
91
+ for result in results:
92
+ print(f"Text: {result.text}")
93
+ print(f"Confidence: {result.confidence:.2%}")
94
  ```
95
 
96
  ### CLI Tool
97
 
98
  ```bash
99
+ # Basic usage
100
+ kiri-ocr predict path/to/document.jpg
101
+
102
+ # With output directory
103
  kiri-ocr predict path/to/document.jpg --output results/
104
  ```
105
 
 
 
 
 
 
106
  ## πŸ“ˆ Benchmarks
107
 
108
  Results on synthetic test images (10 popular fonts):
109
 
110
+ ![Benchmark Table](benchmark_table.png)
111
+
112
+ ![Benchmark Graph](benchmark_graph.png)
113
+
114
+ ## βš™οΈ Configuration
115
+
116
+ Default inference parameters:
117
+
118
+ | Parameter | Value | Description |
119
+ |-----------|-------|-------------|
120
+ | `beam_width` | 4 | Beam search width |
121
+ | `ctc_fusion_alpha` | 0.5 | CTC score fusion weight |
122
+ | `lm_fusion_alpha` | 0.35 | Language model fusion weight |
123
+ | `max_length` | 260 | Maximum output sequence length |
124
+
125
+ ## πŸ“ Model Files
126
+
127
+ ```
128
+ kiri-ocr/
129
+ β”œβ”€β”€ config.json # Model configuration
130
+ β”œβ”€β”€ vocab.json # Character vocabulary
131
+ β”œβ”€β”€ model.safetensors # Model weights
132
+ └── README.md # This file
133
+ ```
134
+
135
+ ## πŸ”— Links
136
+
137
+ - **GitHub**: [github.com/mrrtmob/kiri-ocr](https://github.com/mrrtmob/kiri-ocr)
138
+ - **Dataset**: [mrrtmob/khmer_english_ocr_image_line](https://huggingface.co/datasets/mrrtmob/khmer_english_ocr_image_line)
139
+ - **PyPI**: [pypi.org/project/kiri-ocr](https://pypi.org/project/kiri-ocr)
140
+
141
+ ## πŸ“ Citation
142
+
143
+ ```bibtex
144
+ @software{kiri_ocr,
145
+ author = {mrrtmob},
146
+ title = {Kiri OCR: Lightweight OCR for English and Khmer},
147
+ year = {2026},
148
+ url = {https://huggingface.co/mrrtmob/kiri-ocr}
149
+ }
150
+ ```
151
+
152
+ ## πŸ“„ License
153
 
154
+ This model is released under the [Apache 2.0 License](LICENSE).
155
+ | Formatting | Inconsistent | Consistent tables and code blocks |