Image-Text-to-Text
Transformers
Safetensors
Divehi
paligemma
ocr
dhivehi
thaana
vision-language
text-recognition
text-generation-inference
Instructions to use Serialtechlab/paligemma2-dhivehi-ocr-full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Serialtechlab/paligemma2-dhivehi-ocr-full with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Serialtechlab/paligemma2-dhivehi-ocr-full")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Serialtechlab/paligemma2-dhivehi-ocr-full") model = AutoModelForImageTextToText.from_pretrained("Serialtechlab/paligemma2-dhivehi-ocr-full") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Serialtechlab/paligemma2-dhivehi-ocr-full with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Serialtechlab/paligemma2-dhivehi-ocr-full" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Serialtechlab/paligemma2-dhivehi-ocr-full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Serialtechlab/paligemma2-dhivehi-ocr-full
- SGLang
How to use Serialtechlab/paligemma2-dhivehi-ocr-full with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Serialtechlab/paligemma2-dhivehi-ocr-full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Serialtechlab/paligemma2-dhivehi-ocr-full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Serialtechlab/paligemma2-dhivehi-ocr-full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Serialtechlab/paligemma2-dhivehi-ocr-full", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Serialtechlab/paligemma2-dhivehi-ocr-full with Docker Model Runner:
docker model run hf.co/Serialtechlab/paligemma2-dhivehi-ocr-full
| license: apache-2.0 | |
| language: | |
| - dv | |
| tags: | |
| - ocr | |
| - dhivehi | |
| - thaana | |
| - paligemma | |
| - vision-language | |
| - text-recognition | |
| base_model: google/paligemma2-3b-pt-224 | |
| datasets: | |
| - alakxender/dhivehi-vrd-images | |
| metrics: | |
| - accuracy | |
| library_name: transformers | |
| # paligemma2-dhivehi-ocr-full | |
| ## Model Description | |
| This is a fine-tuned PaliGemma model for Dhivehi (Thaana script) Optical Character Recognition (OCR). The model has been merged from a LoRA adapter into a standalone model for easy deployment. | |
| **Original adapter:** alakxender/paligemma2-qlora-dhivehi-ocr-224-sl-md-16k | |
| **Base model:** google/paligemma2-3b-pt-224 | |
| **Merged on:** 2025-06-29 09:02:20 | |
| ## Capabilities | |
| - Extract Dhivehi/Thaana text from images | |
| - Handle both single-line and multi-line text | |
| - Optimized for printed Dhivehi text recognition | |
| - Works with various image formats and qualities | |
| ## Usage | |
| ```python | |
| import torch | |
| from PIL import Image | |
| from transformers import AutoProcessor, PaliGemmaForConditionalGeneration | |
| # Load the merged model (no base model loading required!) | |
| model_id = "Serialtechlab/paligemma2-dhivehi-ocr-full" | |
| model = PaliGemmaForConditionalGeneration.from_pretrained( | |
| model_id, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto" | |
| ) | |
| processor = AutoProcessor.from_pretrained(model_id) | |
| # Load your image | |
| image = Image.open("your_image.png") | |
| # Prepare inputs | |
| prompt = "<image>What text is written in this image?" | |
| inputs = processor(text=prompt, images=image, return_tensors="pt") | |
| # Move to GPU | |
| for k, v in inputs.items(): | |
| if k == "pixel_values": | |
| inputs[k] = v.to(torch.bfloat16).to("cuda") | |
| else: | |
| inputs[k] = v.to("cuda") | |
| # Generate | |
| with torch.inference_mode(): | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=500, | |
| do_sample=False | |
| ) | |
| # Decode result | |
| result = processor.batch_decode(outputs, skip_special_tokens=True)[0] | |
| dhivehi_text = result.replace(prompt, "").strip() | |
| print(f"Extracted text: " + dhivehi_text) | |
| ``` | |
| ## Model Details | |
| - **Architecture:** PaliGemma (Vision-Language Model) | |
| - **Fine-tuning:** LoRA (Low-Rank Adaptation) | |
| - **Training data:** Dhivehi text images | |
| - **Language:** Dhivehi (Thaana script) | |
| - **Model size:** ~5.9GB (merged weights) | |
| ## Performance | |
| This model provides accurate Dhivehi text extraction from images with good performance on: | |
| - Printed text | |
| - Various font sizes | |
| - Different image qualities | |
| - Single and multi-line text layouts | |
| ## Limitations | |
| - Optimized for printed text (handwritten text may have lower accuracy) | |
| - Performance depends on image quality and text clarity | |
| - Best results with high-contrast, clear images | |
| ## Training Details | |
| - **Base model:** google/paligemma2-3b-pt-224 | |
| - **Fine-tuning method:** LoRA (Low-Rank Adaptation) | |
| - **Target modules:** Vision and language model layers | |
| - **Rank:** 16 | |
| - **Alpha:** 32 | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @misc{dhivehi-ocr-paligemma, | |
| title={Dhivehi OCR with PaliGemma}, | |
| author={Serialtechlab}, | |
| year={2024}, | |
| howpublished={\url{https://huggingface.co/Serialtechlab/paligemma2-dhivehi-ocr-full}} | |
| } | |
| ``` | |
| ## License | |
| This model is released under the Apache 2.0 license, following the base model's licensing terms. |