hasindu-k
/

sinhala-handwritten-notes-v3

+---
+language:
+  - si
+license: apache-2.0
+library_name: transformers
+pipeline_tag: image-to-text
+tags:
+  - ocr
+  - sinhala
+  - handwritten-text-recognition
+  - trocr
+  - vision-encoder-decoder
+  - image-to-text
+datasets:
+  - custom
+metrics:
+  - cer
+  - wer
+base_model: eshangj/TrOCR-Sinhala-finetuned
+---
+# Sinhala Handwritten Notes OCR - TrOCR Fine-tuned Model
+## Model Description
+This model is a fine-tuned TrOCR-based OCR model for recognizing Sinhala handwritten text from image inputs. It was developed to support Sinhala handwritten educational note recognition as part of a Sinhala educational assistant pipeline.
+The model was fine-tuned from `eshangj/TrOCR-Sinhala-finetuned` using a custom Sinhala handwritten notes dataset.
+The main goal of this model is to improve Sinhala handwritten text recognition, especially for cropped handwritten word images or short handwritten text regions. The extracted text can be used for downstream tasks such as document digitization, search, summarization, and question answering.
+## Intended Use
+This model is intended for:
+- Sinhala handwritten text recognition
+- OCR for Sinhala educational notes
+- Word-level or short-line handwritten text extraction
+- Sinhala OCR research experiments
+- Sinhala educational document processing pipelines
+## Not Intended For
+This model is not currently optimized for:
+- Long paragraph-level handwritten OCR
+- Complex full-page layout understanding
+- Printed Sinhala OCR
+- Very noisy or low-resolution images
+- Production use without further validation
+For best results, the input should be a clearly cropped handwritten word or short handwritten text segment.
+## Training Configuration
+| Parameter | Value |
+|---|---:|
+| Base model | `eshangj/TrOCR-Sinhala-finetuned` |
+| Epochs | 20 |
+| Train batch size | 8 |
+| Evaluation batch size | 8 |
+| Learning rate | 2e-5 |
+| FP16 | True |
+| Save strategy | Epoch |
+| Evaluation strategy | Epoch |
+| Logging steps | 20 |
+| Generation enabled | True |
+| Max sequence length | 64 |
+| Evaluation split | 10% |
+| Random state | 42 |
+## Evaluation Results
+Two fine-tuning runs were recorded during experimentation.
+### Run 1
+| Metric | Value |
+|---|---:|
+| Best epoch | 6 |
+| Best evaluation loss | 1.4947 |
+| Best evaluation CER | 0.3741 |
+| Best evaluation WER | 0.5283 |
+| Final CER | 0.2878 |
+| Final WER | 0.4906 |
+| First training loss | 3.4820 |
+| Last training loss | 0.0008 |
+| Minimum training loss | 0.0008 |
+### Run 2
+| Metric | Value |
+|---|---:|
+| Best epoch | 7 |
+| Best evaluation loss | 1.4877 |
+| Best evaluation CER | 0.3453 |
+| Best evaluation WER | 0.4717 |
+| Final CER | 0.3129 |
+| Final WER | 0.4528 |
+| First training loss | 3.4819 |
+| Last training loss | 0.0009 |
+| Minimum training loss | 0.0009 |
+## Result Interpretation
+The model shows clear learning during training, as the training loss decreased from approximately `3.48` to below `0.001`.
+However, the evaluation loss remained around `1.48`, while the training loss became almost zero. This indicates that the model may be overfitting to the training dataset.
+The best validation performance was observed around epoch 6-7, rather than at the final epoch. Therefore, the best checkpoint should be selected based on validation CER and WER, not only based on the final training loss.
+The best recorded validation metrics were:
+| Metric | Best Observed Value |
+|---|---:|
+| Evaluation CER | 0.3453 |
+| Evaluation WER | 0.4717 |
+| Evaluation loss | 1.4877 |
+The best final CER observed was:
+| Metric | Value |
+|---|---:|
+| Final CER | 0.2878 |
+## Metrics Explanation
+### Character Error Rate - CER
+Character Error Rate measures character-level mistakes between the predicted Sinhala text and the ground truth text.
+Lower CER means better character-level recognition.
+### Word Error Rate - WER
+Word Error Rate measures word-level mistakes between the predicted Sinhala text and the ground truth text.
+Lower WER means better word-level recognition.
+Since Sinhala handwritten OCR is challenging due to Sinhala character shapes, ligatures, spacing, and handwriting variations, both CER and WER are useful for evaluating this model.
+## Usage
+```python
+from transformers import TrOCRProcessor, VisionEncoderDecoderModel
+from PIL import Image
+import torch
+model_name = "hasindu-k/sinhala-handwritten-notes-v3"
+processor = TrOCRProcessor.from_pretrained(model_name)
+model = VisionEncoderDecoderModel.from_pretrained(model_name)
+image = Image.open("sample_image.png").convert("RGB")
+pixel_values = processor(images=image, return_tensors="pt").pixel_values
+with torch.no_grad():
+    generated_ids = model.generate(pixel_values)
+predicted_text = processor.batch_decode(
+    generated_ids,
+    skip_special_tokens=True
+)[0]
+print(predicted_text)
+```
+## Recommended Input Format
+For better recognition accuracy, use:
+- Cropped word images
+- Cropped short-line images
+- Clear Sinhala handwritten text
+- High-contrast images
+- Minimal background noise
+- Preprocessed images where ruled lines, shadows, or unnecessary background areas are removed
+## Limitations
+This model still has several limitations:
+- It may produce incorrect characters for visually similar Sinhala letters.
+- It may struggle with long handwritten sentences.
+- It may perform poorly on unseen handwriting styles.
+- It may be sensitive to image quality, skew, blur, shadows, and background noise.
+- It may confuse word boundaries if the input image contains multiple words.
+- Current WER values show that word-level accuracy still needs further improvement.
+## Future Improvements
+Future versions of this model can be improved by:
+- Increasing the Sinhala handwritten training dataset size
+- Adding more diverse handwriting styles
+- Training with better word-level cropped images
+- Applying data augmentation
+- Using early stopping based on validation CER and WER
+- Evaluating using a separate unseen test dataset
+- Improving preprocessing for noisy handwritten documents
+- Comparing performance with other Sinhala OCR models
+## Ethical Considerations
+This model is intended for educational and research use. It should not be used as the only source of truth for high-stakes document interpretation. OCR outputs should be reviewed by a human when accuracy is important.
+## Citation
+If you use this model in research or academic work, please cite the model repository and mention that it is based on a fine-tuned TrOCR architecture for Sinhala handwritten OCR.
+## Model Version
+```text
+Version: v3
+Repository: hasindu-k/sinhala-handwritten-notes-v3
+Fine-tuned from: eshangj/TrOCR-Sinhala-finetuned
+Training timestamp: 2026-05-03
+```