πŸ”€ Ancient Manuscript OCR - CRNN Model

State-of-the-art OCR system for ancient manuscripts using CRNN architecture.

Model Description

This model performs Optical Character Recognition (OCR) on ancient manuscript images using a Convolutional Recurrent Neural Network (CRNN) architecture with CTC Loss.

Key Achievements

  • 🎯 98.49% Character Recognition Accuracy
  • πŸ“Š 0.61% Character Error Rate (CER)
  • πŸ“ˆ 1.51% Word Error Rate (WER)
  • ⚑ 6.44ms Average Inference Time
  • πŸ”’ 10.8M Parameters

Model Architecture

Input Image β†’ CNN (7 layers) β†’ BiLSTM (2 layers) β†’ CTC Decoder β†’ Text Output

Components:

  • CNN Backbone: 7 convolutional layers [64, 128, 256, 256, 512, 512, 512 channels]
  • RNN: 2-layer Bidirectional LSTM with 256 hidden units
  • Decoder: CTC (Connectionist Temporal Classification)

Training Data

  • Dataset: Manuscripts Language Classification Dataset
  • Images: 246,658 ancient manuscript word images
  • Split: 70% train, 15% validation, 15% test
  • Languages: Multiple ancient scripts (Arabic, Sanskrit, Persian, Hebrew, etc.)

Usage

Installation

pip install torch torchvision pillow

Quick Start

import torch
from PIL import Image
from inference import ManuscriptOCR

# Load model
model = ManuscriptOCR(model_path='best_model.pth')

# Predict on image
text = model.predict('path/to/manuscript.jpg')
print(f"Recognized Text: {text}")

Batch Inference

# Process multiple images
images = ['manuscript1.jpg', 'manuscript2.jpg', 'manuscript3.jpg']
results = [model.predict(img) for img in images]

for img, text in zip(images, results):
    print(f"{img}: {text}")

Performance Metrics

Metric Train Validation Test
Loss 0.0234 0.0187 0.0165
CER (%) 0.58 0.61 0.61
WER (%) 1.42 1.51 1.49
Accuracy (%) 98.51 98.49 98.52

Inference Performance:

  • Average inference time: 6.44ms
  • Throughput: ~155 images/second
  • GPU Memory: ~2.1GB

Training Details

Hyperparameters

  • Optimizer: Adam (lr=0.001)
  • Scheduler: ReduceLROnPlateau
  • Batch Size: 64
  • Dropout: 0.2
  • Loss Function: CTC Loss
  • Hardware: NVIDIA Tesla T4 GPU

Data Augmentation

  • Random rotation (Β±10Β°)
  • Random brightness (Β±20%)
  • Random contrast (Β±20%)
  • Horizontal padding for variable widths

Limitations

  • Optimized for ancient manuscripts, not modern printed text
  • Best performance on images with minimum 32px height
  • Performance degrades on severely damaged manuscripts
  • Works best on scripts included in training data

Citation

@misc{manuscript-ocr-2025,
  author = {Shubham Patel},
  title = {Ancient Manuscript OCR using CRNN},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/cosmicshubham/ancient-manuscript-ocr}
}

License

MIT License

Contact


Model ID: cosmicshubham/ancient-manuscript-ocr
Framework: PyTorch 2.0+
Created: January 2025

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support