| | --- |
| | license: cc-by-4.0 |
| | language: |
| | - asm |
| | - mni |
| | - kha |
| | - lus |
| | - grt |
| | - trp |
| | - njz |
| | - brx |
| | - nag |
| | - eng |
| | - hin |
| | tags: |
| | - ocr |
| | - northeast-india |
| | - doctr |
| | - vitstr |
| | - mizo |
| | - garo |
| | - khasi |
| | - nyishi |
| | - kokborok |
| | - nagamese |
| | - bodo |
| | - meitei |
| | --- |
| | |
| | <p align="center"> |
| | <img src="https://huggingface.co/MWirelabs/ne-ocr/resolve/main/assets/mwire.png" width="180" alt="MWire Labs Logo"> |
| | </p> |
| |
|
| | # NE-OCR |
| | ### High-Accuracy OCR for Northeast Indian Scripts |
| |
|
| | [](https://mwirelabs.com/wp-content/uploads/2026/03/NE_OCR_Technical_Report.pdf) |
| | [](https://creativecommons.org/licenses/by/4.0/) |
| | [](#benchmark-test-set) |
| |
|
| | **Purpose-built OCR for Northeast India with 94.99% average character accuracy across 12 language–script pairs.** |
| | Outperforms EasyOCR, Tesseract 5, and TrOCR-large on 9 of 12 language–script pairs. |
| | Fast inference and strong performance where general OCR systems fail. |
| |
|
| | Developed by **MWire Labs, Shillong, Meghalaya**. |
| |
|
| | <p align="center"> |
| | <img src="https://huggingface.co/MWirelabs/ne-ocr/resolve/main/assets/neocrarchitecture.jpg" width="850" alt="NE-OCR Architecture Diagram"> |
| | </p> |
| |
|
| | NE-OCR is built on a ViTSTR-Base encoder with CTC decoding. The model processes 32×128 RGB word/line crops across Latin, Bengali, Devanagari, and Meitei Mayek scripts, outputting text from a 1,056-character multilingual vocabulary. |
| |
|
| | ## Model Details |
| | - **Architecture:** DocTR ViTSTR-Base (86M parameters) |
| | - **Vocab size:** 1056 characters (Latin, Bengali, Devanagari, Meitei Mayek) |
| | - **Input:** 32×128 RGB image crops (word/line level, ≤32 chars) |
| | - **Training data:** ~988k deduplicated samples across 12 languages |
| | - **Trained by:** MWire Labs |
| |
|
| | ## Inference Speed |
| |
|
| | Measured on NVIDIA A40 (batch size = 1): |
| |
|
| | <p align="center"> |
| | <img src="https://huggingface.co/MWirelabs/ne-ocr/resolve/main/assets/inferenceneocr.png" width="700" alt="NE-OCR Latency Comparison"> |
| | </p> |
| |
|
| | - **NE-OCR:** 17.2 ms/image |
| | - EasyOCR: 37.2 ms |
| | - TrOCR-large: 92.1 ms |
| | - Tesseract 5: 166.1 ms |
| | - Chandra (VLM): 313 ms |
| |
|
| | NE-OCR is: |
| | - 2× faster than EasyOCR |
| | - 9× faster than Tesseract |
| | - 18× faster than VLM-based OCR systems |
| |
|
| | ## Benchmark Comparison — Character Accuracy (ChA%) |
| |
|
| | Evaluated on a fixed 26,000-sample benchmark (2,000 per language–script pair). |
| | Higher is better. |
| |
|
| | | Language | Script | **NE-OCR** | EasyOCR | Tesseract 5 | TrOCR-large | Chandra | |
| | |----------|--------|------------|----------|-------------|-------------|----------| |
| | | Assamese | Bengali | **97.46%** | 32.25% | 8.79% | 0.80% | 57.83% | |
| | | Bodo | Devanagari | **83.38%** | 82.65% | 64.85% | 1.85% | 74.76% | |
| | | English | Latin | 90.35% | 68.91% | 50.77% | 88.87% | **91.30%** | |
| | | Garo | Latin | 93.52% | 69.43% | 69.90% | 87.83% | **94.15%** | |
| | | Hindi | Devanagari | **97.69%** | 49.54% | 41.48% | 1.27% | 85.78% | |
| | | Khasi | Latin | **98.85%** | 77.78% | 80.72% | 93.22% | 94.15% | |
| | | Kokborok | Latin | **97.59%** | 83.00% | 78.76% | 94.58% | 96.19% | |
| | | Meitei (Bengali) | Bengali | **97.09%** | 33.64% | 7.30% | 0.55% | 48.34% | |
| | | Meitei (Mayek) | Meitei Mayek | **95.56%** | 2.50% | 2.24% | 2.45% | 2.57% | |
| | | Mizo | Latin | **95.96%** | 67.62% | 68.44% | 84.58% | 92.96% | |
| | | Nagamese | Latin | **97.91%** | 81.60% | 78.05% | 93.46% | 97.60% | |
| | | Nyishi | Latin | **94.50%** | 69.56% | 69.92% | 87.23% | 91.85% | |
| | | **Average** | — | **94.99%** | 59.87% | 51.77% | 53.06% | 77.29% | |
| |
|
| | ## Benchmark Test Set |
| |
|
| | A public benchmark test set is available in the `benchmark/` folder of this repository for reproducing evaluation results and comparing against other OCR models. |
| |
|
| | - **Combined:** `benchmark/ne_ocr_benchmark.parquet` — 26,000 samples across all 12 languages |
| | - **Per-language:** `benchmark/{lang}_test.parquet` — 2,000 samples each |
| | - **Format:** Parquet with columns: `image_path`, `text`, `lang` |
| | - **Filter:** All samples ≤32 characters (word/line-level crops) |
| |
|
| | Results reported in this model card are computed on this exact test set. |
| |
|
| | ## Usage |
| | ````python |
| | import torch, json |
| | import numpy as np |
| | from PIL import Image |
| | from huggingface_hub import hf_hub_download |
| | from doctr.models import vitstr_base |
| | |
| | # Download files |
| | model_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_best.pt') |
| | vocab_path = hf_hub_download(repo_id='MWirelabs/ne-ocr', filename='ne_ocr_vocab.json') |
| | |
| | # Load vocab |
| | with open(vocab_path, encoding='utf-8') as f: |
| | vocab_data = json.load(f) |
| | vocab_str = ''.join(vocab_data['vocab'][1:]) |
| | |
| | # Load model |
| | model = vitstr_base(pretrained=False, vocab=vocab_str) |
| | model.load_state_dict(torch.load(model_path, map_location='cpu')) |
| | model.eval() |
| | |
| | # Inference (word/line crop, max 32 chars) |
| | img = Image.open('your_crop.jpg').convert('RGB').resize((128, 32)) |
| | img_tensor = torch.tensor(np.array(img, dtype=np.float32)/255.0).permute(2,0,1).unsqueeze(0) |
| | out = model(img_tensor) |
| | print(out['preds'][0][0]) |
| | ```` |
| |
|
| | ## Notes |
| | - Model is designed for **word/line-level crops** (≤32 characters), not full pages |
| | - For full page OCR, use a text detection model first (e.g. DBNet) to extract crops |
| | - Bodo accuracy is lower due to limited training data; planned improvement in V2 |
| |
|
| | ## License |
| | CC-BY-4.0 — MWire Labs |
| |
|