You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

MWire Labs Logo

NE-OCR

High-Accuracy OCR for Northeast Indian Scripts

Technical Report License Benchmark

Purpose-built OCR for Northeast India with 94.99% average character accuracy across 12 language–script pairs.
Outperforms EasyOCR, Tesseract 5, and TrOCR-large on 9 of 12 language–script pairs.
Fast inference and strong performance where general OCR systems fail.

Developed by MWire Labs, Shillong, Meghalaya.

NE-OCR Architecture Diagram

NE-OCR is built on a ViTSTR-Base encoder with CTC decoding. The model processes 32×128 RGB word/line crops across Latin, Bengali, Devanagari, and Meitei Mayek scripts, outputting text from a 1,056-character multilingual vocabulary.

Model Details

  • Architecture: DocTR ViTSTR-Base (86M parameters)
  • Vocab size: 1056 characters (Latin, Bengali, Devanagari, Meitei Mayek)
  • Input: 32×128 RGB image crops (word/line level, ≤32 chars)
  • Training data: ~988k deduplicated samples across 12 languages
  • Trained by: MWire Labs

Inference Speed

Measured on NVIDIA A40 (batch size = 1):

NE-OCR Latency Comparison

  • NE-OCR: 17.2 ms/image
  • EasyOCR: 37.2 ms
  • TrOCR-large: 92.1 ms
  • Tesseract 5: 166.1 ms
  • Chandra (VLM): 313 ms

NE-OCR is:

  • 2× faster than EasyOCR
  • 9× faster than Tesseract
  • 18× faster than VLM-based OCR systems

Benchmark Comparison — Character Accuracy (ChA%)

Evaluated on a fixed 26,000-sample benchmark (2,000 per language–script pair).
Higher is better.

Language Script NE-OCR EasyOCR Tesseract 5 TrOCR-large Chandra
Assamese Bengali 97.46% 32.25% 8.79% 0.80% 57.83%
Bodo Devanagari 83.38% 82.65% 64.85% 1.85% 74.76%
English Latin 90.35% 68.91% 50.77% 88.87% 91.30%
Garo Latin 93.52% 69.43% 69.90% 87.83% 94.15%
Hindi Devanagari 97.69% 49.54% 41.48% 1.27% 85.78%
Khasi Latin 98.85% 77.78% 80.72% 93.22% 94.15%
Kokborok Latin 97.59% 83.00% 78.76% 94.58% 96.19%
Meitei (Bengali) Bengali 97.09% 33.64% 7.30% 0.55% 48.34%
Meitei (Mayek) Meitei Mayek 95.56% 2.50% 2.24% 2.45% 2.57%
Mizo Latin 95.96% 67.62% 68.44% 84.58% 92.96%
Nagamese Latin 97.91% 81.60% 78.05% 93.46% 97.60%
Nyishi Latin 94.50% 69.56% 69.92% 87.23% 91.85%
Average 94.99% 59.87% 51.77% 53.06% 77.29%

Benchmark Test Set

A public benchmark test set is available in the benchmark/ folder of this repository for reproducing evaluation results and comparing against other OCR models.

  • Combined: benchmark/ne_ocr_benchmark.parquet — 26,000 samples across all 12 languages
  • Per-language: benchmark/{lang}_test.parquet — 2,000 samples each
  • Format: Parquet with columns: image_path, text, lang
  • Filter: All samples ≤32 characters (word/line-level crops)

Results reported in this model card are computed on this exact test set.

Usage

import torch, json
import numpy as np
from PIL import Image
from huggingface_hub import hf_hub_download
from doctr.models import vitstr_base

# Download files
model_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_best.pt')
vocab_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_vocab.json')

# Load vocab
with open(vocab_path, encoding='utf-8') as f:
    vocab_data = json.load(f)
vocab_str = ''.join(vocab_data['vocab'][1:])

# Load model
model = vitstr_base(pretrained=False, vocab=vocab_str)
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()

# Inference (word/line crop, max 32 chars)
img = Image.open('your_crop.jpg').convert('RGB').resize((128, 32))
img_tensor = torch.tensor(np.array(img, dtype=np.float32)/255.0).permute(2,0,1).unsqueeze(0)
out = model(img_tensor)
print(out['preds'][0][0])

Notes

  • Model is designed for word/line-level crops (≤32 characters), not full pages
  • For full page OCR, use a text detection model first (e.g. DBNet) to extract crops
  • Bodo accuracy is lower due to limited training data; planned improvement in V2

License

CC-BY-4.0 — MWire Labs

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support