Image-Text-to-Text
PEFT
Safetensors
Oriya
ocr
odia
qwen3
qlora
rft
rejection-sampling
conversational
Instructions to use Pritosh/odia-ocr-rft-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Pritosh/odia-ocr-rft-v1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B") model = PeftModel.from_pretrained(base_model, "Pritosh/odia-ocr-rft-v1") - Notebooks
- Google Colab
- Kaggle
Odia OCR — Pritosh/odia-ocr-rft-v1
RFT-v1 — rejection sampling fine-tuned from V5 (154 filtered outputs, CER<0.60)
Training pipeline: V5 SFT → RFT-v1 → V6 SFT → GRPO-v2 → RFT-v2 → V7 SFT
Usage
from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel
import torch
from PIL import Image
processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-4B", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
"Qwen/Qwen3.5-4B", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, "Pritosh/odia-ocr-rft-v1")
image = Image.open("odia_page.jpg")
messages = [
{"role": "system", "content": [{"type": "text", "text": "You are an OCR engine specialized in Odia (ଓଡ଼ିଆ) script. Output the exact Odia text visible in the image. Do not add any explanation or translation."}]},
{"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": "Extract all Odia text from this image."}]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)
- Downloads last month
- -