Khasi-OCR-v1
This is a fine-tuned version of DeepSeek-OCR-2 specifically optimized for high-accuracy Optical Character Recognition (OCR) of the Khasi language.
It was trained on a custom dataset of Khasi news articles, official documents, and literature to overcome the common hallucination and repetition issues (such as infinite loops) found in base multimodal models when handling low-resource languages.
Model Highlights
- Language Support: Native Khasi (using Latin script with special characters like ï and ñ).
- Task: Specialized for "Free OCR" (literal document transcription).
- Base Model: DeepSeek-OCR-2.
- Accuracy: Significantly lower Character Error Rate (CER) & Word Error Rate (WER)
Usage
To use this model on Kaggle or a local GPU, ensure you have transformers, accelerate, and bitsandbytes installed.
from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch
model_id = "toiar/deepseek-ocr-2-khasi-v1"
# Load the model in 4-bit to save memory
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = AutoModel.from_pretrained(
model_id,
trust_remote_code=True,
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
# Set your image path and prompt
image_path = "your_khasi_image.jpg"
prompt = "<image>\nFree OCR."
# Run the transcription
res = model.infer(
tokenizer,
prompt=prompt,
image_file=image_path,
output_path="./outputs",
save_results=False
)
# View results
if isinstance(res, dict) and "text" in res:
print(res["text"])
else:
print(res)
- Downloads last month
- 15