You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

paligemma2-dhivehi-ocr-full

Model Description

This is a fine-tuned PaliGemma model for Dhivehi (Thaana script) Optical Character Recognition (OCR). The model has been merged from a LoRA adapter into a standalone model for easy deployment.

Original adapter: alakxender/paligemma2-qlora-dhivehi-ocr-224-sl-md-16k
Base model: google/paligemma2-3b-pt-224
Merged on: 2025-06-29 09:02:20

Capabilities

  • Extract Dhivehi/Thaana text from images
  • Handle both single-line and multi-line text
  • Optimized for printed Dhivehi text recognition
  • Works with various image formats and qualities

Usage

import torch
from PIL import Image
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration

# Load the merged model (no base model loading required!)
model_id = "Serialtechlab/paligemma2-dhivehi-ocr-full"
model = PaliGemmaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

# Load your image
image = Image.open("your_image.png")

# Prepare inputs
prompt = "<image>What text is written in this image?"
inputs = processor(text=prompt, images=image, return_tensors="pt")

# Move to GPU
for k, v in inputs.items():
    if k == "pixel_values":
        inputs[k] = v.to(torch.bfloat16).to("cuda")
    else:
        inputs[k] = v.to("cuda")

# Generate
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        do_sample=False
    )

# Decode result
result = processor.batch_decode(outputs, skip_special_tokens=True)[0]
dhivehi_text = result.replace(prompt, "").strip()
print(f"Extracted text: " + dhivehi_text)

Model Details

  • Architecture: PaliGemma (Vision-Language Model)
  • Fine-tuning: LoRA (Low-Rank Adaptation)
  • Training data: Dhivehi text images
  • Language: Dhivehi (Thaana script)
  • Model size: ~5.9GB (merged weights)

Performance

This model provides accurate Dhivehi text extraction from images with good performance on:

  • Printed text
  • Various font sizes
  • Different image qualities
  • Single and multi-line text layouts

Limitations

  • Optimized for printed text (handwritten text may have lower accuracy)
  • Performance depends on image quality and text clarity
  • Best results with high-contrast, clear images

Training Details

  • Base model: google/paligemma2-3b-pt-224
  • Fine-tuning method: LoRA (Low-Rank Adaptation)
  • Target modules: Vision and language model layers
  • Rank: 16
  • Alpha: 32

Citation

If you use this model, please cite:

@misc{dhivehi-ocr-paligemma,
  title={Dhivehi OCR with PaliGemma},
  author={Serialtechlab},
  year={2024},
  howpublished={\url{https://huggingface.co/Serialtechlab/paligemma2-dhivehi-ocr-full}}
}

License

This model is released under the Apache 2.0 license, following the base model's licensing terms.

Downloads last month
9
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Serialtechlab/paligemma2-dhivehi-ocr-full

Finetuned
(114)
this model

Dataset used to train Serialtechlab/paligemma2-dhivehi-ocr-full