| --- |
| language: dv |
| license: apache-2.0 |
| tags: |
| - ocr |
| - trocr |
| - dhivehi |
| - maldives |
| - thaana |
| pipeline_tag: image-to-text |
| base_model: microsoft/trocr-base-handwritten |
| datasets: |
| - alakxender/dhivehi-image-text |
| - alakxender/dhivehi-vrd-batch-1-img-questions |
| --- |
| |
| # Dhivehi TrOCR Base V6 |
|
|
| A fine-tuned [TrOCR](https://huggingface.co/microsoft/trocr-base-handwritten) model for Dhivehi (Maldivian) text recognition using Thaana script. |
|
|
| ## Model Details |
|
|
| - **Base model:** microsoft/trocr-base-handwritten |
| - **Parameters:** ~334M |
| - **Training data:** ~695K samples (315K dhivehi-image-text + 380K dhivehi-vrd) |
| - **Best CER:** 0.9% (checkpoint-20000) |
| - **Character tokenizer:** WordLevel (character-level) with EOS |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import TrOCRProcessor, VisionEncoderDecoderModel, PreTrainedTokenizerFast |
| from PIL import Image |
| import torch |
| |
| processor = TrOCRProcessor.from_pretrained("Serialtechlab/dhivehi-trocr-base-handwritten") |
| model = VisionEncoderDecoderModel.from_pretrained("Serialtechlab/dhivehi-trocr-base-handwritten") |
| tokenizer = PreTrainedTokenizerFast.from_pretrained("Serialtechlab/dhivehi-trocr-base-handwritten") |
| |
| image = Image.open("dhivehi_text.png").convert("RGB") |
| pixel_values = processor(image, return_tensors='pt').pixel_values |
| |
| with torch.no_grad(): |
| generated_ids = model.generate(pixel_values, max_length=128, num_beams=4) |
| |
| tokens = tokenizer.convert_ids_to_tokens(generated_ids[0]) |
| special = [tokenizer.pad_token, tokenizer.bos_token, tokenizer.eos_token, tokenizer.unk_token] |
| text = "".join([t for t in tokens if t not in special]) |
| print(text) |
| ``` |
|
|
| ## Training |
|
|
| Trained from scratch on Google Colab (A100) for 6 epochs with: |
|
|
| - Learning rate: 4e-5 |
| - Batch size: 16 |
| - EOS token appended to all labels |
| - Proper PAD token masking (-100) |
| - Character-level WordLevel tokenizer |
|
|
| ## Limitations |
|
|
| - Optimized for single text line images (use a text detector like Surya for full pages) |
| - May truncate very long lines (max_length=128 characters) |
| - Best results on printed Dhivehi text; handwritten accuracy varies by style |
| |