ne-ocr / README.md
Badnyal's picture
Update README.md
44a0808 verified
metadata
license: cc-by-4.0
language:
  - asm
  - mni
  - kha
  - lus
  - grt
  - trp
  - njz
  - brx
  - nag
  - eng
  - hin
tags:
  - ocr
  - northeast-india
  - doctr
  - vitstr
  - mizo
  - garo
  - khasi
  - nyishi
  - kokborok
  - nagamese
  - bodo
  - meitei

MWire Labs Logo

NE-OCR

High-Accuracy OCR for Northeast Indian Scripts

Technical Report License Benchmark

Purpose-built OCR for Northeast India with 94.99% average character accuracy across 12 language–script pairs.
Outperforms EasyOCR, Tesseract 5, and TrOCR-large on 9 of 12 language–script pairs.
Fast inference and strong performance where general OCR systems fail.

Developed by MWire Labs, Shillong, Meghalaya.

NE-OCR Architecture Diagram

NE-OCR is built on a ViTSTR-Base encoder with CTC decoding. The model processes 32×128 RGB word/line crops across Latin, Bengali, Devanagari, and Meitei Mayek scripts, outputting text from a 1,056-character multilingual vocabulary.

Model Details

  • Architecture: DocTR ViTSTR-Base (86M parameters)
  • Vocab size: 1056 characters (Latin, Bengali, Devanagari, Meitei Mayek)
  • Input: 32×128 RGB image crops (word/line level, ≤32 chars)
  • Training data: ~988k deduplicated samples across 12 languages
  • Trained by: MWire Labs

Inference Speed

Measured on NVIDIA A40 (batch size = 1):

NE-OCR Latency Comparison

  • NE-OCR: 17.2 ms/image
  • EasyOCR: 37.2 ms
  • TrOCR-large: 92.1 ms
  • Tesseract 5: 166.1 ms
  • Chandra (VLM): 313 ms

NE-OCR is:

  • 2× faster than EasyOCR
  • 9× faster than Tesseract
  • 18× faster than VLM-based OCR systems

Benchmark Comparison — Character Accuracy (ChA%)

Evaluated on a fixed 26,000-sample benchmark (2,000 per language–script pair).
Higher is better.

Language Script NE-OCR EasyOCR Tesseract 5 TrOCR-large Chandra
Assamese Bengali 97.46% 32.25% 8.79% 0.80% 57.83%
Bodo Devanagari 83.38% 82.65% 64.85% 1.85% 74.76%
English Latin 90.35% 68.91% 50.77% 88.87% 91.30%
Garo Latin 93.52% 69.43% 69.90% 87.83% 94.15%
Hindi Devanagari 97.69% 49.54% 41.48% 1.27% 85.78%
Khasi Latin 98.85% 77.78% 80.72% 93.22% 94.15%
Kokborok Latin 97.59% 83.00% 78.76% 94.58% 96.19%
Meitei (Bengali) Bengali 97.09% 33.64% 7.30% 0.55% 48.34%
Meitei (Mayek) Meitei Mayek 95.56% 2.50% 2.24% 2.45% 2.57%
Mizo Latin 95.96% 67.62% 68.44% 84.58% 92.96%
Nagamese Latin 97.91% 81.60% 78.05% 93.46% 97.60%
Nyishi Latin 94.50% 69.56% 69.92% 87.23% 91.85%
Average 94.99% 59.87% 51.77% 53.06% 77.29%

Benchmark Test Set

A public benchmark test set is available in the benchmark/ folder of this repository for reproducing evaluation results and comparing against other OCR models.

  • Combined: benchmark/ne_ocr_benchmark.parquet — 26,000 samples across all 12 languages
  • Per-language: benchmark/{lang}_test.parquet — 2,000 samples each
  • Format: Parquet with columns: image_path, text, lang
  • Filter: All samples ≤32 characters (word/line-level crops)

Results reported in this model card are computed on this exact test set.

Usage

import torch, json
import numpy as np
from PIL import Image
from huggingface_hub import hf_hub_download
from doctr.models import vitstr_base

# Download files
model_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_best.pt')
vocab_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_vocab.json')

# Load vocab
with open(vocab_path, encoding='utf-8') as f:
    vocab_data = json.load(f)
vocab_str = ''.join(vocab_data['vocab'][1:])

# Load model
model = vitstr_base(pretrained=False, vocab=vocab_str)
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()

# Inference (word/line crop, max 32 chars)
img = Image.open('your_crop.jpg').convert('RGB').resize((128, 32))
img_tensor = torch.tensor(np.array(img, dtype=np.float32)/255.0).permute(2,0,1).unsqueeze(0)
out = model(img_tensor)
print(out['preds'][0][0])

Notes

  • Model is designed for word/line-level crops (≤32 characters), not full pages
  • For full page OCR, use a text detection model first (e.g. DBNet) to extract crops
  • Bodo accuracy is lower due to limited training data; planned improvement in V2

License

CC-BY-4.0 — MWire Labs