NE-OCR
High-Accuracy OCR for Northeast Indian Scripts
Purpose-built OCR for Northeast India with 94.99% average character accuracy across 12 language–script pairs.
Outperforms EasyOCR, Tesseract 5, and TrOCR-large on 9 of 12 language–script pairs.
Fast inference and strong performance where general OCR systems fail.
Developed by MWire Labs, Shillong, Meghalaya.
NE-OCR is built on a ViTSTR-Base encoder with CTC decoding. The model processes 32×128 RGB word/line crops across Latin, Bengali, Devanagari, and Meitei Mayek scripts, outputting text from a 1,056-character multilingual vocabulary.
Model Details
- Architecture: DocTR ViTSTR-Base (86M parameters)
- Vocab size: 1056 characters (Latin, Bengali, Devanagari, Meitei Mayek)
- Input: 32×128 RGB image crops (word/line level, ≤32 chars)
- Training data: ~988k deduplicated samples across 12 languages
- Trained by: MWire Labs
Inference Speed
Measured on NVIDIA A40 (batch size = 1):
- NE-OCR: 17.2 ms/image
- EasyOCR: 37.2 ms
- TrOCR-large: 92.1 ms
- Tesseract 5: 166.1 ms
- Chandra (VLM): 313 ms
NE-OCR is:
- 2× faster than EasyOCR
- 9× faster than Tesseract
- 18× faster than VLM-based OCR systems
Benchmark Comparison — Character Accuracy (ChA%)
Evaluated on a fixed 26,000-sample benchmark (2,000 per language–script pair).
Higher is better.
| Language | Script | NE-OCR | EasyOCR | Tesseract 5 | TrOCR-large | Chandra |
|---|---|---|---|---|---|---|
| Assamese | Bengali | 97.46% | 32.25% | 8.79% | 0.80% | 57.83% |
| Bodo | Devanagari | 83.38% | 82.65% | 64.85% | 1.85% | 74.76% |
| English | Latin | 90.35% | 68.91% | 50.77% | 88.87% | 91.30% |
| Garo | Latin | 93.52% | 69.43% | 69.90% | 87.83% | 94.15% |
| Hindi | Devanagari | 97.69% | 49.54% | 41.48% | 1.27% | 85.78% |
| Khasi | Latin | 98.85% | 77.78% | 80.72% | 93.22% | 94.15% |
| Kokborok | Latin | 97.59% | 83.00% | 78.76% | 94.58% | 96.19% |
| Meitei (Bengali) | Bengali | 97.09% | 33.64% | 7.30% | 0.55% | 48.34% |
| Meitei (Mayek) | Meitei Mayek | 95.56% | 2.50% | 2.24% | 2.45% | 2.57% |
| Mizo | Latin | 95.96% | 67.62% | 68.44% | 84.58% | 92.96% |
| Nagamese | Latin | 97.91% | 81.60% | 78.05% | 93.46% | 97.60% |
| Nyishi | Latin | 94.50% | 69.56% | 69.92% | 87.23% | 91.85% |
| Average | — | 94.99% | 59.87% | 51.77% | 53.06% | 77.29% |
Benchmark Test Set
A public benchmark test set is available in the benchmark/ folder of this repository for reproducing evaluation results and comparing against other OCR models.
- Combined:
benchmark/ne_ocr_benchmark.parquet— 26,000 samples across all 12 languages - Per-language:
benchmark/{lang}_test.parquet— 2,000 samples each - Format: Parquet with columns:
image_path,text,lang - Filter: All samples ≤32 characters (word/line-level crops)
Results reported in this model card are computed on this exact test set.
Usage
import torch, json
import numpy as np
from PIL import Image
from huggingface_hub import hf_hub_download
from doctr.models import vitstr_base
# Download files
model_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_best.pt')
vocab_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_vocab.json')
# Load vocab
with open(vocab_path, encoding='utf-8') as f:
vocab_data = json.load(f)
vocab_str = ''.join(vocab_data['vocab'][1:])
# Load model
model = vitstr_base(pretrained=False, vocab=vocab_str)
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()
# Inference (word/line crop, max 32 chars)
img = Image.open('your_crop.jpg').convert('RGB').resize((128, 32))
img_tensor = torch.tensor(np.array(img, dtype=np.float32)/255.0).permute(2,0,1).unsqueeze(0)
out = model(img_tensor)
print(out['preds'][0][0])
Notes
- Model is designed for word/line-level crops (≤32 characters), not full pages
- For full page OCR, use a text detection model first (e.g. DBNet) to extract crops
- Bodo accuracy is lower due to limited training data; planned improvement in V2
License
CC-BY-4.0 — MWire Labs