Instructions to use toiar/Ri-Gemma-Vision-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use toiar/Ri-Gemma-Vision-v1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for toiar/Ri-Gemma-Vision-v1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for toiar/Ri-Gemma-Vision-v1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for toiar/Ri-Gemma-Vision-v1 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="toiar/Ri-Gemma-Vision-v1", max_seq_length=2048, )
Ri-Gemma-Vision-v1: Khasi OCR Vision Model
A fine-tuned vision-language model for Optical Character Recognition (OCR) of Khasi language documents, built on top of Gemma-4-E2B-it.
Model Summary
| Property | Details |
|---|---|
| Base Model | unsloth/gemma-4-E2B-it |
| Task | OCR → Markdown transcription |
| Languages | Khasi (kha), English (en) |
| Fine-tuning Method | QLoRA (4-bit) via Unsloth |
| Training Samples | 22,985 |
| Validation Samples | 1,300 |
Dataset
Trained on toiar/Khasi-Gemma-OCR-24K, a 24K sample dataset consisting of:
- Real scanned Khasi books and articles
- Synthetic Khasi text images
- Real scanned English books
Each sample contains a scanned page image paired with its ground truth Markdown transcription.
Inference
from unsloth import FastVisionModel, get_chat_template
from PIL import Image
from transformers import TextIteratorStreamer
from threading import Thread
import torch
model, processor = FastVisionModel.from_pretrained(
"toiar/Ri-Gemma-Vision-v1",
load_in_4bit = False,
torch_dtype = torch.bfloat16,
device_map = "auto",
)
FastVisionModel.for_inference(model)
processor = get_chat_template(processor, "gemma-4")
# Load image
image = Image.open("your_image.jpg").convert("RGB")
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Convert to Markdown."},
{"type": "image"},
],
}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")
# Streaming inference
streamer = TextIteratorStreamer(
processor.tokenizer,
skip_prompt=True,
skip_special_tokens=True
)
thread = Thread(target=model.generate, kwargs=dict(
**inputs,
streamer=streamer,
max_new_tokens=4096,
use_cache=True,
do_sample=False,
))
thread.start()
for token in streamer:
print(token, end="", flush=True)
thread.join()
Citation
@misc{ri-gemma-vision-v1,
author = {Toiarbor Mawlieh},
title = {Ri-Gemma-Vision-v1: Khasi OCR Vision Model},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/toiar/Ri-Gemma-Vision-v1}
}
- Downloads last month
- -