You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Ri-Gemma-Vision-v1: Khasi OCR Vision Model

A fine-tuned vision-language model for Optical Character Recognition (OCR) of Khasi language documents, built on top of Gemma-4-E4B-it.

Model Summary

Property	Details
Base Model	`unsloth/gemma-4-E4B-it`
Task	OCR → Markdown transcription
Languages	Khasi (`kha`), English (`en`)
Fine-tuning Method	QLoRA (4-bit) via Unsloth
Training Samples	22,985
Validation Samples	1,300

Dataset

Trained on toiar/Khasi-Gemma-OCR-24K, a 24K sample dataset consisting of:

Real scanned Khasi books and articles
Synthetic Khasi text images
Real scanned English books

Each sample contains a scanned page image paired with its ground truth Markdown transcription.

Inference

from unsloth import FastVisionModel, get_chat_template
from PIL import Image
from transformers import TextIteratorStreamer
from threading import Thread
import torch

model, processor = FastVisionModel.from_pretrained(
    "toiar/Ri-Gemma-Vision-v1",
    load_in_4bit = False, 
    torch_dtype = torch.bfloat16,
    device_map = "auto",
)

FastVisionModel.for_inference(model)
processor = get_chat_template(processor, "gemma-4")

# Load image
image = Image.open("your_image.jpg").convert("RGB")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text",  "text": "Convert to Markdown."},
            {"type": "image"},
        ],
    }
]

input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")

# Streaming inference
streamer = TextIteratorStreamer(
    processor.tokenizer,
    skip_prompt=True, 
    skip_special_tokens=True
)

thread = Thread(target=model.generate, kwargs=dict(
    **inputs, 
    streamer=streamer, 
    max_new_tokens=4096,
    use_cache=True, 
    do_sample=False,
))
thread.start()

for token in streamer:
    print(token, end="", flush=True)
thread.join()

Citation

@misc{ri-gemma-vision-v1,
  author    = {Toiarbor Mawlieh},
  title     = {Ri-Gemma-Vision-v1: Khasi OCR Vision Model},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/toiar/Ri-Gemma-Vision-v1}
}

Downloads last month: -

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for toiar/Ri-Gemma-Vision-v1

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

unsloth/gemma-4-E4B-it

Finetuned

(96)

this model