hasindu-k commited on
Commit
166a9f0
·
verified ·
1 Parent(s): e0e6481

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +219 -3
README.md CHANGED
@@ -1,3 +1,219 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - si
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ pipeline_tag: image-to-text
7
+ tags:
8
+ - ocr
9
+ - sinhala
10
+ - handwritten-text-recognition
11
+ - trocr
12
+ - vision-encoder-decoder
13
+ - image-to-text
14
+ datasets:
15
+ - custom
16
+ metrics:
17
+ - cer
18
+ - wer
19
+ base_model: eshangj/TrOCR-Sinhala-finetuned
20
+ ---
21
+
22
+ # Sinhala Handwritten Notes OCR - TrOCR Fine-tuned Model
23
+
24
+ ## Model Description
25
+
26
+ This model is a fine-tuned TrOCR-based OCR model for recognizing Sinhala handwritten text from image inputs. It was developed to support Sinhala handwritten educational note recognition as part of a Sinhala educational assistant pipeline.
27
+
28
+ The model was fine-tuned from `eshangj/TrOCR-Sinhala-finetuned` using a custom Sinhala handwritten notes dataset.
29
+
30
+ The main goal of this model is to improve Sinhala handwritten text recognition, especially for cropped handwritten word images or short handwritten text regions. The extracted text can be used for downstream tasks such as document digitization, search, summarization, and question answering.
31
+
32
+ ## Intended Use
33
+
34
+ This model is intended for:
35
+
36
+ - Sinhala handwritten text recognition
37
+ - OCR for Sinhala educational notes
38
+ - Word-level or short-line handwritten text extraction
39
+ - Sinhala OCR research experiments
40
+ - Sinhala educational document processing pipelines
41
+
42
+ ## Not Intended For
43
+
44
+ This model is not currently optimized for:
45
+
46
+ - Long paragraph-level handwritten OCR
47
+ - Complex full-page layout understanding
48
+ - Printed Sinhala OCR
49
+ - Very noisy or low-resolution images
50
+ - Production use without further validation
51
+
52
+ For best results, the input should be a clearly cropped handwritten word or short handwritten text segment.
53
+
54
+ ## Training Configuration
55
+
56
+ | Parameter | Value |
57
+ |---|---:|
58
+ | Base model | `eshangj/TrOCR-Sinhala-finetuned` |
59
+ | Epochs | 20 |
60
+ | Train batch size | 8 |
61
+ | Evaluation batch size | 8 |
62
+ | Learning rate | 2e-5 |
63
+ | FP16 | True |
64
+ | Save strategy | Epoch |
65
+ | Evaluation strategy | Epoch |
66
+ | Logging steps | 20 |
67
+ | Generation enabled | True |
68
+ | Max sequence length | 64 |
69
+ | Evaluation split | 10% |
70
+ | Random state | 42 |
71
+
72
+ ## Evaluation Results
73
+
74
+ Two fine-tuning runs were recorded during experimentation.
75
+
76
+ ### Run 1
77
+
78
+ | Metric | Value |
79
+ |---|---:|
80
+ | Best epoch | 6 |
81
+ | Best evaluation loss | 1.4947 |
82
+ | Best evaluation CER | 0.3741 |
83
+ | Best evaluation WER | 0.5283 |
84
+ | Final CER | 0.2878 |
85
+ | Final WER | 0.4906 |
86
+ | First training loss | 3.4820 |
87
+ | Last training loss | 0.0008 |
88
+ | Minimum training loss | 0.0008 |
89
+
90
+ ### Run 2
91
+
92
+ | Metric | Value |
93
+ |---|---:|
94
+ | Best epoch | 7 |
95
+ | Best evaluation loss | 1.4877 |
96
+ | Best evaluation CER | 0.3453 |
97
+ | Best evaluation WER | 0.4717 |
98
+ | Final CER | 0.3129 |
99
+ | Final WER | 0.4528 |
100
+ | First training loss | 3.4819 |
101
+ | Last training loss | 0.0009 |
102
+ | Minimum training loss | 0.0009 |
103
+
104
+ ## Result Interpretation
105
+
106
+ The model shows clear learning during training, as the training loss decreased from approximately `3.48` to below `0.001`.
107
+
108
+ However, the evaluation loss remained around `1.48`, while the training loss became almost zero. This indicates that the model may be overfitting to the training dataset.
109
+
110
+ The best validation performance was observed around epoch 6-7, rather than at the final epoch. Therefore, the best checkpoint should be selected based on validation CER and WER, not only based on the final training loss.
111
+
112
+ The best recorded validation metrics were:
113
+
114
+ | Metric | Best Observed Value |
115
+ |---|---:|
116
+ | Evaluation CER | 0.3453 |
117
+ | Evaluation WER | 0.4717 |
118
+ | Evaluation loss | 1.4877 |
119
+
120
+ The best final CER observed was:
121
+
122
+ | Metric | Value |
123
+ |---|---:|
124
+ | Final CER | 0.2878 |
125
+
126
+ ## Metrics Explanation
127
+
128
+ ### Character Error Rate - CER
129
+
130
+ Character Error Rate measures character-level mistakes between the predicted Sinhala text and the ground truth text.
131
+
132
+ Lower CER means better character-level recognition.
133
+
134
+ ### Word Error Rate - WER
135
+
136
+ Word Error Rate measures word-level mistakes between the predicted Sinhala text and the ground truth text.
137
+
138
+ Lower WER means better word-level recognition.
139
+
140
+ Since Sinhala handwritten OCR is challenging due to Sinhala character shapes, ligatures, spacing, and handwriting variations, both CER and WER are useful for evaluating this model.
141
+
142
+ ## Usage
143
+
144
+ ```python
145
+ from transformers import TrOCRProcessor, VisionEncoderDecoderModel
146
+ from PIL import Image
147
+ import torch
148
+
149
+ model_name = "hasindu-k/sinhala-handwritten-notes-v3"
150
+
151
+ processor = TrOCRProcessor.from_pretrained(model_name)
152
+ model = VisionEncoderDecoderModel.from_pretrained(model_name)
153
+
154
+ image = Image.open("sample_image.png").convert("RGB")
155
+
156
+ pixel_values = processor(images=image, return_tensors="pt").pixel_values
157
+
158
+ with torch.no_grad():
159
+ generated_ids = model.generate(pixel_values)
160
+
161
+ predicted_text = processor.batch_decode(
162
+ generated_ids,
163
+ skip_special_tokens=True
164
+ )[0]
165
+
166
+ print(predicted_text)
167
+ ```
168
+
169
+ ## Recommended Input Format
170
+
171
+ For better recognition accuracy, use:
172
+
173
+ - Cropped word images
174
+ - Cropped short-line images
175
+ - Clear Sinhala handwritten text
176
+ - High-contrast images
177
+ - Minimal background noise
178
+ - Preprocessed images where ruled lines, shadows, or unnecessary background areas are removed
179
+
180
+ ## Limitations
181
+
182
+ This model still has several limitations:
183
+
184
+ - It may produce incorrect characters for visually similar Sinhala letters.
185
+ - It may struggle with long handwritten sentences.
186
+ - It may perform poorly on unseen handwriting styles.
187
+ - It may be sensitive to image quality, skew, blur, shadows, and background noise.
188
+ - It may confuse word boundaries if the input image contains multiple words.
189
+ - Current WER values show that word-level accuracy still needs further improvement.
190
+
191
+ ## Future Improvements
192
+
193
+ Future versions of this model can be improved by:
194
+
195
+ - Increasing the Sinhala handwritten training dataset size
196
+ - Adding more diverse handwriting styles
197
+ - Training with better word-level cropped images
198
+ - Applying data augmentation
199
+ - Using early stopping based on validation CER and WER
200
+ - Evaluating using a separate unseen test dataset
201
+ - Improving preprocessing for noisy handwritten documents
202
+ - Comparing performance with other Sinhala OCR models
203
+
204
+ ## Ethical Considerations
205
+
206
+ This model is intended for educational and research use. It should not be used as the only source of truth for high-stakes document interpretation. OCR outputs should be reviewed by a human when accuracy is important.
207
+
208
+ ## Citation
209
+
210
+ If you use this model in research or academic work, please cite the model repository and mention that it is based on a fine-tuned TrOCR architecture for Sinhala handwritten OCR.
211
+
212
+ ## Model Version
213
+
214
+ ```text
215
+ Version: v3
216
+ Repository: hasindu-k/sinhala-handwritten-notes-v3
217
+ Fine-tuned from: eshangj/TrOCR-Sinhala-finetuned
218
+ Training timestamp: 2026-05-03
219
+ ```