Thareah commited on
Commit
37795b9
·
verified ·
1 Parent(s): 826c9ce

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ inference.pdiparams filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,165 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Khmer OCR Recognition Model
2
+
3
+ 🇰🇭 **High-accuracy OCR model for Khmer text recognition using PaddleOCR framework**
4
+
5
+ ## Model Overview
6
+
7
+ This CRNN-based OCR model is specifically trained for Khmer (Cambodian) text recognition, achieving **98.45% accuracy** on validation data. The model is optimized for recognizing short text segments (3-5 words) commonly found in documents, signs, and printed materials.
8
+
9
+ ## 🏗️ Model Architecture
10
+
11
+ - **Framework**: PaddleOCR 2.7+
12
+ - **Algorithm**: CRNN (Convolutional Recurrent Neural Network)
13
+ - **Backbone**: ResNet34
14
+ - **Neck**: SequenceEncoder with RNN (hidden_size: 256)
15
+ - **Head**: CTCHead with CTC Loss
16
+ - **Input Shape**: `[3, 32, 320]` (channels, height, width)
17
+ - **Max Text Length**: 25 characters
18
+
19
+ ## 📝 Supported Characters
20
+
21
+ The model recognizes **188 characters** including:
22
+
23
+ - **Khmer Consonants**: ក ខ គ ឃ ង ច ឆ ជ ឈ ញ ដ ឋ ឌ ឍ ណ ត ថ ទ ធ ន ប ផ ព ភ ម យ រ ល វ ស ហ ឡ អ
24
+ - **Khmer Vowels**: ា ិ ី ឹ ឺ ុ ូ ួ ើ ឿ ៀ េ ែ ៃ ោ ៅ ំ ះ ៈ
25
+ - **Khmer Numerals**: ០ ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩
26
+ - **Latin Characters**: A-Z, a-z, 0-9
27
+ - **Punctuation**: . , ! ? - ( ) [ ] « » ™ ® etc.
28
+ - **Khmer Symbols**: ។ ៕ ៖ ៗ ៉ ៊ ់ ៌ ៍ ៏ ័ ្
29
+
30
+ ## 🚀 Quick Start
31
+
32
+ ### Installation
33
+
34
+ ```bash
35
+ pip install paddlepaddle paddleocr opencv-python
36
+ ```
37
+
38
+ ### Basic Usage
39
+
40
+ ```python
41
+ from paddleocr import PaddleOCR
42
+ import cv2
43
+
44
+ # Initialize OCR with custom Khmer model
45
+ ocr = PaddleOCR(
46
+ use_angle_cls=True,
47
+ lang='ch', # Use Chinese as base language
48
+ rec_model_dir='path/to/model', # Directory containing inference files
49
+ rec_char_dict_path='khmer_char_dict.txt',
50
+ show_log=False
51
+ )
52
+
53
+ # Process image
54
+ result = ocr.ocr('khmer_text_image.jpg', cls=True)
55
+
56
+ # Extract results
57
+ for idx in range(len(result)):
58
+ res = result[idx]
59
+ if res is None:
60
+ continue
61
+ for line in res:
62
+ text = line[1][0] # Recognized text
63
+ confidence = line[1][1] # Confidence score
64
+ print(f'Text: {text}, Confidence: {confidence:.3f}')
65
+ ```
66
+
67
+ ### Command Line Usage
68
+
69
+ ```bash
70
+ # Download model files to a directory
71
+ # Then use PaddleOCR tools:
72
+
73
+ python tools/infer/predict_rec.py \
74
+ --image_dir="your_khmer_image.png" \
75
+ --rec_model_dir="path/to/model" \
76
+ --rec_char_dict_path="khmer_char_dict.txt"
77
+ ```
78
+
79
+ ## 📁 Files Included
80
+
81
+ | File | Size | Description |
82
+ |------|------|-------------|
83
+ | `inference.pdiparams` | ~106MB | Main model weights |
84
+ | `inference.yml` | ~2KB | Model configuration |
85
+ | `inference.json` | ~1KB | Model metadata |
86
+ | `khmer_char_dict.txt` | ~2KB | Character dictionary (188 characters) |
87
+ | `training_config.yml` | ~2KB | Original training configuration |
88
+
89
+ ## 🔧 Training Details
90
+
91
+ ### Dataset Characteristics
92
+ - **Text Length**: 3-5 words per image (optimized for short segments)
93
+ - **Image Size**: 600×80 pixels (training), resized to 320×32 for inference
94
+ - **Font**: KhmerOS TTF
95
+ - **Background**: White background with black text
96
+ - **Augmentation**: Clean, blurred, noisy, and noise+blur variants
97
+
98
+ ### Training Configuration
99
+ - **Epochs**: 30 (best model at epoch 29)
100
+ - **Optimizer**: Adam with β₁=0.9, β₂=0.999
101
+ - **Learning Rate**: Cosine scheduling (initial: 0.001)
102
+ - **Batch Size**: 32
103
+ - **Loss Function**: CTC Loss
104
+ - **Regularization**: L2 (factor: 4e-05)
105
+
106
+ ## 💡 Usage Tips
107
+
108
+ ### Best Practices
109
+ 1. **Image Quality**: Use high-contrast images with clear text
110
+ 2. **Text Length**: Optimal for 3-5 word segments (model's training focus)
111
+ 3. **Resolution**: Images should be reasonably sized (not too small)
112
+ 4. **Preprocessing**: Consider using text detection for full documents
113
+
114
+ ### For Long Text Documents
115
+ Since this model is optimized for short segments, for full documents:
116
+
117
+ 1. **Use Text Detection**: Combine with PaddleOCR's detection model
118
+ 2. **Segment Text**: Break long lines into 3-5 word chunks
119
+ 3. **Post-process**: Combine results from multiple segments
120
+
121
+ ```python
122
+ # Example for full document processing
123
+ ocr = PaddleOCR(
124
+ use_angle_cls=True,
125
+ lang='ch',
126
+ det_model_dir='path/to/detection/model', # Add detection model
127
+ rec_model_dir='path/to/this/model', # This Khmer recognition model
128
+ rec_char_dict_path='khmer_char_dict.txt'
129
+ )
130
+
131
+ # This will detect text regions AND recognize them
132
+ result = ocr.ocr('full_document.jpg', cls=True)
133
+ ```
134
+
135
+
136
+ ## 🔄 Model Conversion
137
+
138
+ This model was exported from PaddlePaddle training format to inference format:
139
+
140
+ ```bash
141
+ # Original export command used:
142
+ python tools/export_model.py \
143
+ -c pretrainoutput/config.yml \
144
+ -o Global.pretrained_model=pretrainoutput/best_accuracy.pdparams \
145
+ Global.save_inference_dir=pretrainoutput/inference
146
+ ```
147
+
148
+ ## 🛠️ Requirements
149
+
150
+ ```
151
+ paddlepaddle>=2.4.0
152
+ opencv-python>=4.5.0
153
+ numpy>=1.19.0
154
+ pillow>=8.0.0
155
+ ```
156
+
157
+ ```bibtex
158
+ @misc{khmer-ocr-2025,
159
+ title={Khmer OCR Recognition Model},
160
+ author={[Your Name]},
161
+ year={2025},
162
+ publisher={Hugging Face},
163
+ howpublished={\url{https://huggingface.co/[your-username]/khmer-ocr}}
164
+ }
165
+ ```
example_usage.py ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Example usage of the Khmer OCR Recognition Model
4
+ Demonstrates how to use the model for Khmer text recognition
5
+ """
6
+
7
+ from paddleocr import PaddleOCR
8
+ import cv2
9
+ import os
10
+ import json
11
+
12
+ def khmer_ocr_example(image_path, model_dir="."):
13
+ """
14
+ Example function showing how to use the Khmer OCR model
15
+
16
+ Args:
17
+ image_path (str): Path to the image containing Khmer text
18
+ model_dir (str): Directory containing the model files
19
+
20
+ Returns:
21
+ list: OCR results with text, confidence, and bounding boxes
22
+ """
23
+
24
+ print(f"🔍 Processing: {image_path}")
25
+ print("=" * 50)
26
+
27
+ # Initialize PaddleOCR with custom Khmer model
28
+ try:
29
+ ocr = PaddleOCR(
30
+ use_angle_cls=True,
31
+ lang='ch', # Use Chinese as base language
32
+ rec_model_dir=model_dir, # Directory with inference files
33
+ rec_char_dict_path=os.path.join(model_dir, 'khmer_char_dict.txt'),
34
+ show_log=False
35
+ )
36
+ print("✅ Model loaded successfully")
37
+ except Exception as e:
38
+ print(f"❌ Error loading model: {e}")
39
+ return None
40
+
41
+ # Check if image exists
42
+ if not os.path.exists(image_path):
43
+ print(f"❌ Image file not found: {image_path}")
44
+ return None
45
+
46
+ # Process the image
47
+ try:
48
+ result = ocr.ocr(image_path, cls=True)
49
+ print("✅ OCR processing completed")
50
+ except Exception as e:
51
+ print(f"❌ Error processing image: {e}")
52
+ return None
53
+
54
+ # Extract and display results
55
+ if result[0] is None:
56
+ print("⚠️ No text detected in the image.")
57
+ return []
58
+
59
+ all_results = []
60
+ total_confidence = 0
61
+
62
+ print(f"\n📝 Detected Text Regions: {len(result[0])}")
63
+ print("-" * 50)
64
+
65
+ for idx, line in enumerate(result[0]):
66
+ box = line[0] # Bounding box coordinates [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
67
+ text = line[1][0] # Recognized text
68
+ confidence = line[1][1] # Confidence score
69
+
70
+ # Store result
71
+ result_item = {
72
+ 'region_id': idx + 1,
73
+ 'text': text,
74
+ 'confidence': confidence,
75
+ 'bounding_box': box
76
+ }
77
+ all_results.append(result_item)
78
+ total_confidence += confidence
79
+
80
+ # Display result
81
+ print(f"Region {idx + 1}:")
82
+ print(f" 📄 Text: {text}")
83
+ print(f" 🎯 Confidence: {confidence:.3f}")
84
+ print(f" 📍 Box: [{box[0][0]:.0f},{box[0][1]:.0f}] → [{box[2][0]:.0f},{box[2][1]:.0f}]")
85
+ print()
86
+
87
+ # Summary
88
+ avg_confidence = total_confidence / len(result[0]) if result[0] else 0
89
+ print("📊 Summary:")
90
+ print(f" Total regions: {len(result[0])}")
91
+ print(f" Average confidence: {avg_confidence:.3f}")
92
+
93
+ # Combine all text
94
+ full_text = " ".join([item['text'] for item in all_results])
95
+ print(f" 📝 Full text: {full_text}")
96
+
97
+ return all_results
98
+
99
+ def batch_process_images(image_dir, model_dir=".", output_file="ocr_results.json"):
100
+ """
101
+ Process multiple images in a directory
102
+
103
+ Args:
104
+ image_dir (str): Directory containing images
105
+ model_dir (str): Directory containing model files
106
+ output_file (str): Output JSON file for results
107
+ """
108
+
109
+ print(f"🔄 Batch processing images from: {image_dir}")
110
+
111
+ # Find image files
112
+ image_extensions = ['.jpg', '.jpeg', '.png', '.bmp', '.tiff']
113
+ image_files = []
114
+
115
+ if os.path.isdir(image_dir):
116
+ for file in os.listdir(image_dir):
117
+ if any(file.lower().endswith(ext) for ext in image_extensions):
118
+ image_files.append(os.path.join(image_dir, file))
119
+
120
+ if not image_files:
121
+ print(f"❌ No image files found in {image_dir}")
122
+ return
123
+
124
+ print(f"📁 Found {len(image_files)} images")
125
+
126
+ all_results = {}
127
+
128
+ for image_path in image_files:
129
+ print(f"\n🖼️ Processing: {os.path.basename(image_path)}")
130
+ results = khmer_ocr_example(image_path, model_dir)
131
+ if results:
132
+ all_results[image_path] = results
133
+
134
+ # Save results to JSON
135
+ try:
136
+ with open(output_file, 'w', encoding='utf-8') as f:
137
+ json.dump(all_results, f, ensure_ascii=False, indent=2)
138
+ print(f"\n💾 Results saved to: {output_file}")
139
+ except Exception as e:
140
+ print(f"❌ Error saving results: {e}")
141
+
142
+ def main():
143
+ """Main function with example usage"""
144
+
145
+ print("🇰🇭 Khmer OCR Recognition Model - Example Usage")
146
+ print("=" * 60)
147
+
148
+ # Example 1: Single image processing
149
+ print("\n📖 Example 1: Single Image Processing")
150
+ print("-" * 40)
151
+
152
+ # You can replace this with your actual image path
153
+ example_image = "sample_khmer_image.jpg"
154
+
155
+ if os.path.exists(example_image):
156
+ results = khmer_ocr_example(example_image)
157
+ if results:
158
+ print("✅ Single image processing completed successfully!")
159
+ else:
160
+ print(f"ℹ️ Example image '{example_image}' not found.")
161
+ print(" Please provide your own Khmer text image.")
162
+
163
+ # Example 2: Batch processing
164
+ print("\n📖 Example 2: Batch Processing")
165
+ print("-" * 40)
166
+
167
+ sample_dir = "sample_images"
168
+ if os.path.exists(sample_dir):
169
+ batch_process_images(sample_dir)
170
+ else:
171
+ print(f"ℹ️ Sample directory '{sample_dir}' not found.")
172
+ print(" Create a directory with Khmer images to test batch processing.")
173
+
174
+ # Example 3: Model info
175
+ print("\n📖 Example 3: Model Information")
176
+ print("-" * 40)
177
+
178
+ model_files = [
179
+ 'inference.pdiparams',
180
+ 'inference.yml',
181
+ 'inference.json',
182
+ 'khmer_char_dict.txt'
183
+ ]
184
+
185
+ print("📁 Required model files:")
186
+ for file in model_files:
187
+ if os.path.exists(file):
188
+ size = os.path.getsize(file) / (1024*1024) # MB
189
+ print(f" ✅ {file} ({size:.1f}MB)")
190
+ else:
191
+ print(f" ❌ {file} - Missing!")
192
+
193
+ # Load character dictionary info
194
+ char_dict_path = 'khmer_char_dict.txt'
195
+ if os.path.exists(char_dict_path):
196
+ try:
197
+ with open(char_dict_path, 'r', encoding='utf-8') as f:
198
+ chars = f.read().strip().split('\n')
199
+ print(f"\n📝 Character Dictionary: {len(chars)} characters supported")
200
+ print(f" Sample characters: {' '.join(chars[:20])}...")
201
+ except Exception as e:
202
+ print(f"❌ Error reading character dictionary: {e}")
203
+
204
+ print("\n🎯 Usage Tips:")
205
+ print(" • Best for 3-5 word text segments")
206
+ print(" • Use high-contrast, clear images")
207
+ print(" • Combine with text detection for full documents")
208
+ print(" • Model supports 188 Khmer and Latin characters")
209
+
210
+ print("\n✨ Happy OCR-ing with Khmer text!")
211
+
212
+ if __name__ == "__main__":
213
+ main()
inference.json ADDED
The diff for this file is too large to render. See raw diff
 
inference.pdiparams ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fbdcb5dc3814d9253fd917a9b123ad36398c76906f86e34e63180109cb72aa5
3
+ size 98271715
inference.yml ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ PreProcess:
2
+ transform_ops:
3
+ - DecodeImage:
4
+ channel_first: false
5
+ img_mode: BGR
6
+ - CTCLabelEncode: null
7
+ - RecResizeImg:
8
+ image_shape:
9
+ - 3
10
+ - 32
11
+ - 320
12
+ - KeepKeys:
13
+ keep_keys:
14
+ - image
15
+ - label
16
+ - length
17
+ PostProcess:
18
+ name: CTCLabelDecode
19
+ character_dict:
20
+ - ' '
21
+ - '!'
22
+ - '%'
23
+ - '&'
24
+ - (
25
+ - )
26
+ - +
27
+ - ','
28
+ - '-'
29
+ - .
30
+ - /
31
+ - '0'
32
+ - '1'
33
+ - '2'
34
+ - '3'
35
+ - '4'
36
+ - '5'
37
+ - '6'
38
+ - '7'
39
+ - '8'
40
+ - '9'
41
+ - ':'
42
+ - '?'
43
+ - A
44
+ - B
45
+ - C
46
+ - D
47
+ - E
48
+ - F
49
+ - G
50
+ - H
51
+ - I
52
+ - J
53
+ - K
54
+ - L
55
+ - M
56
+ - N
57
+ - O
58
+ - P
59
+ - R
60
+ - S
61
+ - T
62
+ - U
63
+ - V
64
+ - W
65
+ - X
66
+ - Y
67
+ - Z
68
+ - '['
69
+ - ']'
70
+ - a
71
+ - b
72
+ - c
73
+ - d
74
+ - e
75
+ - f
76
+ - g
77
+ - h
78
+ - i
79
+ - j
80
+ - k
81
+ - l
82
+ - m
83
+ - n
84
+ - o
85
+ - p
86
+ - q
87
+ - r
88
+ - s
89
+ - t
90
+ - u
91
+ - v
92
+ - w
93
+ - x
94
+ - y
95
+ - z
96
+ - «
97
+ - ®
98
+ - »
99
+ - ក
100
+ - ខ
101
+ - គ
102
+ - ឃ
103
+ - ង
104
+ - ច
105
+ - ឆ
106
+ - ជ
107
+ - ឈ
108
+ - ញ
109
+ - ដ
110
+ - ឋ
111
+ - ឌ
112
+ - ឍ
113
+ - ណ
114
+ - ត
115
+ - ថ
116
+ - ទ
117
+ - ធ
118
+ - ន
119
+ - ប
120
+ - ផ
121
+ - ព
122
+ - ភ
123
+ - ម
124
+ - យ
125
+ - រ
126
+ - ល
127
+ - វ
128
+ - ស
129
+ - ហ
130
+ - ឡ
131
+ - អ
132
+ - ឥ
133
+ - ឧ
134
+ - ឫ
135
+ - ឬ
136
+ - ឭ
137
+ - ឯ
138
+ - ឱ
139
+ - ឲ
140
+ - ា
141
+ - ិ
142
+ - ី
143
+ - ឹ
144
+ - ឺ
145
+ - ុ
146
+ - ូ
147
+ - ួ
148
+ - ើ
149
+ - ឿ
150
+ - ៀ
151
+ - េ
152
+ - ែ
153
+ - ៃ
154
+ - ោ
155
+ - ៅ
156
+ - ំ
157
+ - ះ
158
+ - ៈ
159
+ - ៉
160
+ - ៊
161
+ - ់
162
+ - ៌
163
+ - ៍
164
+ - ៏
165
+ - ័
166
+ - ្
167
+ - ។
168
+ - ៕
169
+ - ៖
170
+ - ៗ
171
+ - ០
172
+ - ១
173
+ - ២
174
+ - ៣
175
+ - ៤
176
+ - ៥
177
+ - ៦
178
+ - ៧
179
+ - ៨
180
+ - ៩
181
+ - –
182
+ - —
183
+ - ‘
184
+ - ’
185
+ - “
186
+ - ”
187
+ - ™
khmer_char_dict.txt ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ !
3
+ %
4
+ &
5
+ (
6
+ )
7
+ +
8
+ ,
9
+ -
10
+ .
11
+ /
12
+ 0
13
+ 1
14
+ 2
15
+ 3
16
+ 4
17
+ 5
18
+ 6
19
+ 7
20
+ 8
21
+ 9
22
+ :
23
+ ?
24
+ A
25
+ B
26
+ C
27
+ D
28
+ E
29
+ F
30
+ G
31
+ H
32
+ I
33
+ J
34
+ K
35
+ L
36
+ M
37
+ N
38
+ O
39
+ P
40
+ R
41
+ S
42
+ T
43
+ U
44
+ V
45
+ W
46
+ X
47
+ Y
48
+ Z
49
+ [
50
+ ]
51
+ a
52
+ b
53
+ c
54
+ d
55
+ e
56
+ f
57
+ g
58
+ h
59
+ i
60
+ j
61
+ k
62
+ l
63
+ m
64
+ n
65
+ o
66
+ p
67
+ q
68
+ r
69
+ s
70
+ t
71
+ u
72
+ v
73
+ w
74
+ x
75
+ y
76
+ z
77
+ «
78
+ ®
79
+ »
80
+
81
+
82
+
83
+
84
+
85
+
86
+
87
+
88
+
89
+
90
+
91
+
92
+
93
+
94
+
95
+
96
+
97
+
98
+
99
+
100
+
101
+
102
+
103
+
104
+
105
+
106
+
107
+
108
+
109
+
110
+
111
+
112
+
113
+
114
+
115
+
116
+
117
+
118
+
119
+
120
+
121
+
122
+
123
+
124
+
125
+
126
+
127
+
128
+
129
+
130
+
131
+
132
+
133
+
134
+
135
+
136
+
137
+
138
+
139
+
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+
153
+
154
+
155
+
156
+
157
+
158
+
159
+
160
+
161
+
162
+
163
+
164
+
165
+
166
+
167
+
168
+
model_info.json ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "Khmer OCR Recognition Model",
3
+ "description": "CRNN-based OCR model specifically trained for Khmer text recognition",
4
+ "framework": "PaddleOCR",
5
+ "architecture": {
6
+ "algorithm": "CRNN",
7
+ "backbone": "ResNet34",
8
+ "neck": "SequenceEncoder (RNN)",
9
+ "head": "CTCHead",
10
+ "loss": "CTCLoss"
11
+ },
12
+ "performance": {
13
+ "accuracy": 98.45,
14
+ "normalized_edit_distance": 99.90,
15
+ "inference_speed_fps": 326,
16
+ "best_epoch": 29,
17
+ "total_epochs": 30
18
+ },
19
+ "training_data": {
20
+ "training_images": 13253,
21
+ "validation_images": 4315,
22
+ "total_images": 17568,
23
+ "text_length_range": "3-5 words",
24
+ "image_size": "600x80 pixels (training), 320x32 (inference)",
25
+ "font": "KhmerOS",
26
+ "augmentation": ["clean", "blurred", "noisy", "noise_blur"]
27
+ },
28
+ "model_specifications": {
29
+ "input_shape": [3, 32, 320],
30
+ "max_text_length": 25,
31
+ "character_count": 188,
32
+ "supported_languages": ["Khmer", "Latin"],
33
+ "model_size_mb": 106
34
+ },
35
+ "character_set": {
36
+ "khmer_consonants": "ក ខ គ ឃ ង ច ឆ ជ ឈ ញ ដ ឋ ឌ ឍ ណ ត ថ ទ ធ ន ប ផ ព ភ ម យ រ ល វ ស ហ ឡ អ",
37
+ "khmer_vowels": "ា ិ ី ឹ ឺ ុ ូ ួ ើ ឿ ៀ េ ែ ៃ ោ ៅ ំ ះ ៈ",
38
+ "khmer_numerals": "០ ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩",
39
+ "latin_characters": "A-Z, a-z, 0-9",
40
+ "punctuation": ". , ! ? - ( ) [ ] « » ™ ® etc.",
41
+ "khmer_symbols": "។ ៕ ៖ ៗ ៉ ៊ ់ ៌ ៍ ៏ ័ ្"
42
+ },
43
+ "training_config": {
44
+ "optimizer": "Adam",
45
+ "learning_rate": "Cosine scheduling (initial: 0.001)",
46
+ "batch_size": 32,
47
+ "regularization": "L2 (4e-05)",
48
+ "image_augmentation": true,
49
+ "data_variants": 4
50
+ },
51
+ "usage_recommendations": {
52
+ "optimal_text_length": "3-5 words",
53
+ "image_quality": "High contrast, clear text",
54
+ "use_cases": ["Road signs", "Document snippets", "Menu items", "Form fields"],
55
+ "preprocessing": "Consider text detection for full documents"
56
+ },
57
+ "files": {
58
+ "inference.pdiparams": "Main model weights (106MB)",
59
+ "inference.yml": "Model configuration",
60
+ "inference.json": "Model metadata",
61
+ "khmer_char_dict.txt": "Character dictionary (188 characters)",
62
+ "training_config.yml": "Original training configuration"
63
+ },
64
+ "requirements": [
65
+ "paddlepaddle>=2.4.0",
66
+ "opencv-python>=4.5.0",
67
+ "numpy>=1.19.0",
68
+ "pillow>=8.0.0"
69
+ ],
70
+ "limitations": [
71
+ "Optimized for short text segments (3-5 words)",
72
+ "Best performance on clean, printed text",
73
+ "May need segmentation for longer text",
74
+ "Trained primarily on synthetic data"
75
+ ],
76
+ "license": "Specify your license",
77
+ "created_date": "2025-09-25",
78
+ "version": "1.0",
79
+ "contact": {
80
+ "author": "Your Name",
81
+ "email": "your.email@example.com",
82
+ "repository": "https://huggingface.co/your-username/khmer-ocr"
83
+ }
84
+ }
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ paddlepaddle>=2.4.0
2
+ opencv-python>=4.5.0
3
+ numpy>=1.19.0
4
+ pillow>=8.0.0
5
+ pyclipper>=1.3.0
training_config.yml ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Global:
2
+ use_gpu: true
3
+ epoch_num: 30
4
+ log_smooth_window: 20
5
+ print_batch_step: 10
6
+ save_model_dir: pretrainoutput
7
+ save_epoch_step: 5
8
+ eval_batch_step:
9
+ - 0
10
+ - 2000
11
+ cal_metric_during_train: true
12
+ pretrained_model: ../source/model/best_accuracy.pdparams
13
+ checkpoints: null
14
+ save_inference_dir: ../source/infer
15
+ use_visualdl: false
16
+ character_dict_path: ../OCR/output_images/khmer_char_dict.txt
17
+ character_type: ch
18
+ max_text_length: 25
19
+ infer_mode: false
20
+ use_space_char: true
21
+ save_res_path: ../output/predicts_khmer_lite.txt
22
+ Optimizer:
23
+ name: Adam
24
+ beta1: 0.9
25
+ beta2: 0.999
26
+ lr:
27
+ name: Cosine
28
+ learning_rate: 0.001
29
+ regularizer:
30
+ name: L2
31
+ factor: 4.0e-05
32
+ Architecture:
33
+ model_type: rec
34
+ algorithm: CRNN
35
+ Transform: null
36
+ Backbone:
37
+ name: ResNet
38
+ layers: 34
39
+ Neck:
40
+ name: SequenceEncoder
41
+ encoder_type: rnn
42
+ hidden_size: 256
43
+ Head:
44
+ name: CTCHead
45
+ fc_decay: 4.0e-05
46
+ Loss:
47
+ name: CTCLoss
48
+ PostProcess:
49
+ name: CTCLabelDecode
50
+ Metric:
51
+ name: RecMetric
52
+ main_indicator: acc
53
+ Train:
54
+ dataset:
55
+ name: SimpleDataSet
56
+ data_dir: ../OCR/output_images
57
+ label_file_list: ../OCR/output_images/train_rec.txt
58
+ transforms:
59
+ - DecodeImage:
60
+ img_mode: BGR
61
+ channel_first: false
62
+ - RecAug: null
63
+ - CTCLabelEncode: null
64
+ - RecResizeImg:
65
+ image_shape:
66
+ - 3
67
+ - 32
68
+ - 320
69
+ - KeepKeys:
70
+ keep_keys:
71
+ - image
72
+ - label
73
+ - length
74
+ loader:
75
+ shuffle: true
76
+ batch_size_per_card: 32
77
+ drop_last: true
78
+ num_workers: 8
79
+ Eval:
80
+ dataset:
81
+ name: SimpleDataSet
82
+ data_dir: ../OCR/output_images
83
+ label_file_list: ../OCR/output_images/val_rec.txt
84
+ transforms:
85
+ - DecodeImage:
86
+ img_mode: BGR
87
+ channel_first: false
88
+ - CTCLabelEncode: null
89
+ - RecResizeImg:
90
+ image_shape:
91
+ - 3
92
+ - 32
93
+ - 320
94
+ - KeepKeys:
95
+ keep_keys:
96
+ - image
97
+ - label
98
+ - length
99
+ loader:
100
+ shuffle: false
101
+ drop_last: false
102
+ batch_size_per_card: 32
103
+ num_workers: 8
104
+ profiler_options: null