π€ Ancient Manuscript OCR - CRNN Model
State-of-the-art OCR system for ancient manuscripts using CRNN architecture.
Model Description
This model performs Optical Character Recognition (OCR) on ancient manuscript images using a Convolutional Recurrent Neural Network (CRNN) architecture with CTC Loss.
Key Achievements
- π― 98.49% Character Recognition Accuracy
- π 0.61% Character Error Rate (CER)
- π 1.51% Word Error Rate (WER)
- β‘ 6.44ms Average Inference Time
- π’ 10.8M Parameters
Model Architecture
Input Image β CNN (7 layers) β BiLSTM (2 layers) β CTC Decoder β Text Output
Components:
- CNN Backbone: 7 convolutional layers [64, 128, 256, 256, 512, 512, 512 channels]
- RNN: 2-layer Bidirectional LSTM with 256 hidden units
- Decoder: CTC (Connectionist Temporal Classification)
Training Data
- Dataset: Manuscripts Language Classification Dataset
- Images: 246,658 ancient manuscript word images
- Split: 70% train, 15% validation, 15% test
- Languages: Multiple ancient scripts (Arabic, Sanskrit, Persian, Hebrew, etc.)
Usage
Installation
pip install torch torchvision pillow
Quick Start
import torch
from PIL import Image
from inference import ManuscriptOCR
# Load model
model = ManuscriptOCR(model_path='best_model.pth')
# Predict on image
text = model.predict('path/to/manuscript.jpg')
print(f"Recognized Text: {text}")
Batch Inference
# Process multiple images
images = ['manuscript1.jpg', 'manuscript2.jpg', 'manuscript3.jpg']
results = [model.predict(img) for img in images]
for img, text in zip(images, results):
print(f"{img}: {text}")
Performance Metrics
| Metric | Train | Validation | Test |
|---|---|---|---|
| Loss | 0.0234 | 0.0187 | 0.0165 |
| CER (%) | 0.58 | 0.61 | 0.61 |
| WER (%) | 1.42 | 1.51 | 1.49 |
| Accuracy (%) | 98.51 | 98.49 | 98.52 |
Inference Performance:
- Average inference time: 6.44ms
- Throughput: ~155 images/second
- GPU Memory: ~2.1GB
Training Details
Hyperparameters
- Optimizer: Adam (lr=0.001)
- Scheduler: ReduceLROnPlateau
- Batch Size: 64
- Dropout: 0.2
- Loss Function: CTC Loss
- Hardware: NVIDIA Tesla T4 GPU
Data Augmentation
- Random rotation (Β±10Β°)
- Random brightness (Β±20%)
- Random contrast (Β±20%)
- Horizontal padding for variable widths
Limitations
- Optimized for ancient manuscripts, not modern printed text
- Best performance on images with minimum 32px height
- Performance degrades on severely damaged manuscripts
- Works best on scripts included in training data
Citation
@misc{manuscript-ocr-2025,
author = {Shubham Patel},
title = {Ancient Manuscript OCR using CRNN},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/cosmicshubham/ancient-manuscript-ocr}
}
License
MIT License
Contact
- Author: Shubham Patel
- GitHub: @CosmicShubham1
- Repository: ancient-manuscript-ocr
Model ID: cosmicshubham/ancient-manuscript-ocr
Framework: PyTorch 2.0+
Created: January 2025