You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Khasi-OCR-v1

This is a fine-tuned version of DeepSeek-OCR-2 specifically optimized for high-accuracy Optical Character Recognition (OCR) of the Khasi language.

It was trained on a custom dataset of Khasi news articles, official documents, and literature to overcome the common hallucination and repetition issues (such as infinite loops) found in base multimodal models when handling low-resource languages.

Model Highlights

  • Language Support: Native Khasi (using Latin script with special characters like ï and ñ).
  • Task: Specialized for "Free OCR" (literal document transcription).
  • Base Model: DeepSeek-OCR-2.
  • Accuracy: Significantly lower Character Error Rate (CER) & Word Error Rate (WER)

Usage

To use this model on Kaggle or a local GPU, ensure you have transformers, accelerate, and bitsandbytes installed.

from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch

model_id = "toiar/deepseek-ocr-2-khasi-v1"

# Load the model in 4-bit to save memory
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModel.from_pretrained(
    model_id,
    trust_remote_code=True,
    quantization_config=bnb_config,
    device_map="auto"            
)

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

# Set your image path and prompt
image_path = "your_khasi_image.jpg" 
prompt = "<image>\nFree OCR."

# Run the transcription
res = model.infer(
    tokenizer,
    prompt=prompt,
    image_file=image_path,
    output_path="./outputs",
    save_results=False
)

# View results
if isinstance(res, dict) and "text" in res:
    print(res["text"])
else:
    print(res)
Downloads last month
15
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for toiar/Khasi-OCR-v1

Finetuned
(1)
this model

Dataset used to train toiar/Khasi-OCR-v1