Instructions to use Serialtechlab/paligemma2-dhivehi-ocr-full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Serialtechlab/paligemma2-dhivehi-ocr-full with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Serialtechlab/paligemma2-dhivehi-ocr-full")

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Serialtechlab/paligemma2-dhivehi-ocr-full")
model = AutoModelForMultimodalLM.from_pretrained("Serialtechlab/paligemma2-dhivehi-ocr-full")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Serialtechlab/paligemma2-dhivehi-ocr-full with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Serialtechlab/paligemma2-dhivehi-ocr-full"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Serialtechlab/paligemma2-dhivehi-ocr-full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Serialtechlab/paligemma2-dhivehi-ocr-full

SGLang

How to use Serialtechlab/paligemma2-dhivehi-ocr-full with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Serialtechlab/paligemma2-dhivehi-ocr-full" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Serialtechlab/paligemma2-dhivehi-ocr-full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Serialtechlab/paligemma2-dhivehi-ocr-full" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Serialtechlab/paligemma2-dhivehi-ocr-full",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Serialtechlab/paligemma2-dhivehi-ocr-full with Docker Model Runner:
```
docker model run hf.co/Serialtechlab/paligemma2-dhivehi-ocr-full
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

paligemma2-dhivehi-ocr-full

Model Description

This is a fine-tuned PaliGemma model for Dhivehi (Thaana script) Optical Character Recognition (OCR). The model has been merged from a LoRA adapter into a standalone model for easy deployment.

Original adapter: alakxender/paligemma2-qlora-dhivehi-ocr-224-sl-md-16k
Base model: google/paligemma2-3b-pt-224
Merged on: 2025-06-29 09:02:20

Capabilities

Extract Dhivehi/Thaana text from images
Handle both single-line and multi-line text
Optimized for printed Dhivehi text recognition
Works with various image formats and qualities

Usage

import torch
from PIL import Image
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration

# Load the merged model (no base model loading required!)
model_id = "Serialtechlab/paligemma2-dhivehi-ocr-full"
model = PaliGemmaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

# Load your image
image = Image.open("your_image.png")

# Prepare inputs
prompt = "<image>What text is written in this image?"
inputs = processor(text=prompt, images=image, return_tensors="pt")

# Move to GPU
for k, v in inputs.items():
    if k == "pixel_values":
        inputs[k] = v.to(torch.bfloat16).to("cuda")
    else:
        inputs[k] = v.to("cuda")

# Generate
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        do_sample=False
    )

# Decode result
result = processor.batch_decode(outputs, skip_special_tokens=True)[0]
dhivehi_text = result.replace(prompt, "").strip()
print(f"Extracted text: " + dhivehi_text)

Model Details

Architecture: PaliGemma (Vision-Language Model)
Fine-tuning: LoRA (Low-Rank Adaptation)
Training data: Dhivehi text images
Language: Dhivehi (Thaana script)
Model size: ~5.9GB (merged weights)

Performance

This model provides accurate Dhivehi text extraction from images with good performance on:

Printed text
Various font sizes
Different image qualities
Single and multi-line text layouts

Limitations

Optimized for printed text (handwritten text may have lower accuracy)
Performance depends on image quality and text clarity
Best results with high-contrast, clear images

Training Details

Base model: google/paligemma2-3b-pt-224
Fine-tuning method: LoRA (Low-Rank Adaptation)
Target modules: Vision and language model layers
Rank: 16
Alpha: 32

Citation

If you use this model, please cite:

@misc{dhivehi-ocr-paligemma,
  title={Dhivehi OCR with PaliGemma},
  author={Serialtechlab},
  year={2024},
  howpublished={\url{https://huggingface.co/Serialtechlab/paligemma2-dhivehi-ocr-full}}
}

License

This model is released under the Apache 2.0 license, following the base model's licensing terms.

Downloads last month: -

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Serialtechlab/paligemma2-dhivehi-ocr-full

Base model

google/paligemma2-3b-pt-224

Finetuned

(114)

this model

Serialtechlab
/

paligemma2-dhivehi-ocr-full