Khasi OCR Models
Collection
OCR models for reading text from images and scanned documents. • 1 item • Updated
Dak-OCR is a fine-tuned version of DeepSeek-OCR-2 designed for accurate OCR, Document Understanding, and Handwriting Recognition in the Khasi Language.
It was trained on the custom Khasi-OCR-36K dataset to reduce hallucination and repetition issues often seen in base multimodal models when working with low-resource languages. The model is designed to preserve document structure and produce clean Markdown output.
unsloth/DeepSeek-OCR-2bfloat16 precision for optimal quality and memory efficiency.The model was evaluated on a mixed set of 40 highly dense Khasi samples containing complex markdown and degraded/noisy scans.
| Metric | Score |
|---|---|
| WER | 1.71% |
| CER | 0.91% |
from unsloth import FastVisionModel
from transformers import AutoModel
import torch
# Load Model
model, tokenizer = FastVisionModel.from_pretrained(
"toiar/Dak-OCR",
load_in_4bit = False,
auto_model = AutoModel,
trust_remote_code = True,
torch_dtype = torch.bfloat16,
)
FastVisionModel.for_inference(model)
model.generation_config.do_sample = False
model.generation_config.temperature = None
model.generation_config.top_p = None
# Inference
prompt = "<image>\nFree OCR."
image_path = "path/to/your/khasi_document.png"
with torch.no_grad():
output = model.infer(
tokenizer,
prompt=prompt,
image_file=image_path,
base_size=1024,
image_size=768,
crop_mode=True
)
print(output)