Odia OCR — Pritosh/odia-ocr-rft-v1

RFT-v1 — rejection sampling fine-tuned from V5 (154 filtered outputs, CER<0.60)

Training pipeline: V5 SFT → RFT-v1 → V6 SFT → GRPO-v2 → RFT-v2 → V7 SFT

Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
from peft import PeftModel
import torch
from PIL import Image

processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-4B", trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    "Qwen/Qwen3.5-4B", torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, "Pritosh/odia-ocr-rft-v1")

image = Image.open("odia_page.jpg")
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are an OCR engine specialized in Odia (ଓଡ଼ିଆ) script. Output the exact Odia text visible in the image. Do not add any explanation or translation."}]},
    {"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": "Extract all Odia text from this image."}]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)

Downloads last month: -

Model tree for Pritosh/odia-ocr-rft-v1

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(166)

this model