TrOCR Sanskrit OCR

Fine-tuned TrOCR model for Sanskrit manuscript OCR (Optical Character Recognition).

Model Description

This model extracts Sanskrit text from manuscript images. It outputs text in IAST (International Alphabet of Sanskrit Transliteration) format.

  • Base Model: microsoft/trocr-base-printed
  • Fine-tuned on: yzk/veda-ocr-ms (~11.7k Sanskrit manuscript images)
  • Training: 2 epochs on Apple M1 Pro (MPS)

Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

# Load model
processor = TrOCRProcessor.from_pretrained("Piyush3142/trocr-sanskrit-ocr")
model = VisionEncoderDecoderModel.from_pretrained("Piyush3142/trocr-sanskrit-ocr")

# OCR inference
image = Image.open("sanskrit_manuscript.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
outputs = model.generate(pixel_values, max_length=256)
text = processor.batch_decode(outputs, skip_special_tokens=True)[0]

print(text)  # IAST output

Convert IAST to Devanagari

from indic_transliteration.sanscript import transliterate, IAST, DEVANAGARI

devanagari = transliterate(text, IAST, DEVANAGARI)
print(devanagari)

Training Details

Parameter Value
Dataset yzk/veda-ocr-ms
Train samples 10,560
Test samples 1,174
Epochs 2
Batch size 4
Learning rate 2e-5
CER ~68%
WER ~80%

Limitations

  • Trained on printed Vedic manuscripts
  • Best performance on similar style texts
  • May struggle with handwritten or degraded images

License

MIT

Downloads last month
26
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Piyush3142/trocr-sanskrit-ocr

Space using Piyush3142/trocr-sanskrit-ocr 1