Update README.md

2e8fac1 verified about 1 month ago

2.6 kB

license: cc-by-4.0
language:
  - lus
  - grt
  - kha
  - trp
  - nag
  - njz
  - en
tags:
  - ocr
  - image-text-to-text
  - northeast-india
  - low-resource
  - vision-language
  - mizo
  - garo
  - khasi
  - kokborok
  - nagamese
  - nyishi
library_name: transformers
pipeline_tag: image-text-to-text

Kren Vision

Kren Vision is a fine-tuned vision-language model for optical character recognition (OCR) of Northeast Indian languages. It is part of the Kren AI Stack by MWire Labs, focused on building foundational language technology for Northeast India's indigenous languages.

Built on an open-source vision-language model with LoRA fine-tuning on 618k deduplicated synthetic OCR samples across 6 Latin-script NE languages.

Supported Languages

Language	Script
Mizo	Latin
Garo	Latin
Khasi	Latin
Kokborok	Latin
Nagamese	Latin
Nyishi	Latin

Performance

Evaluated on 500 held-out test samples:

Metric	Score
Exact Match	92.60%
CER	0.85%

Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
from qwen_vl_utils import process_vision_info
import torch

processor = AutoProcessor.from_pretrained("MWirelabs/kren-vision")
model = AutoModelForImageTextToText.from_pretrained(
    "MWirelabs/kren-vision",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "user", "content": [
        {"type": "image", "image": "your_image.jpg"},
        {"type": "text", "text": "OCR the text in this image."}
    ]}
]

inputs = processor.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True,
    return_dict=True, return_tensors="pt"
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=128)
trimmed = [out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)]
output = processor.batch_decode(trimmed, skip_special_tokens=True)
print(output[0])

Training

Data: 618k deduplicated synthetic OCR samples across 6 languages
Fine-tuning: LoRA (r=16, alpha=32) on vision and language projection layers
Hardware: NVIDIA RTX 6000 Ada (48GB)
Epochs: 2

Citation

@misc{kren-vision-2026,
  title={Kren Vision: OCR for Northeast Indian Languages},
  author={MWire Labs},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/MWirelabs/kren-vision}
}

License

CC-BY-4.0 — MWire Labs, 2026