PrabhatGupta786's picture
Update README.md
4f87291 verified

A newer version of the Gradio SDK is available: 6.16.0

Upgrade
metadata
title: Hand Written Text Recognition
emoji: πŸ“
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.12.0
app_file: app.py
pinned: false

✍️ Handwritten Paragraph to Typed Text

This project is a robust Handwritten Text Recognition (HTR) pipeline that converts full paragraphs of handwriting into digital, editable text. It bridges the gap between traditional Computer Vision and modern Deep Learning by combining OpenCV with Transformer-based models.

πŸš€ Live Demo

You can try the live application on Hugging Face Spaces: [https://prabhatgupta786-hand-written-text-recognition.hf.space]


πŸ› οΈ Technical Architecture

The system follows a three-stage pipeline to ensure high accuracy on multi-line text:

1. Image Pre-processing (OpenCV)

Standard OCR models perform best on single lines. This project uses a custom pre-processing engine to handle full paragraphs:

  • Thresholding: Converts images to binary (B&W) to isolate ink from paper.
  • Morphological Dilation: Uses a horizontal kernel (5, 100) to "smear" letters into line-level blobs.
  • Contour Detection: Identifies these blobs as individual lines and segments them.

2. Deep Learning Inference (TrOCR)

The segmented lines are processed using TrOCR (Transformer-based Optical Character Recognition):

  • Encoder: A Vision Transformer (ViT) that processes the image patches.
  • Decoder: A RoBERTa language model that generates text based on visual features and linguistic context.
  • Framework: Powered by Hugging Face Transformers and PyTorch.

3. User Interface (Gradio)

The logic is wrapped in a Gradio web interface, allowing users to upload images and receive text outputs in real-time.


🧰 Tech Stack

  • Language: Python
  • Computer Vision: OpenCV, NumPy, Pillow
  • Deep Learning: PyTorch, Transformers (TrOCR-Large)
  • Deployment: Gradio, Hugging Face Spaces
  • Version Control: Git LFS (Large File Storage for model binaries)

πŸ“‚ Project Structure

  • app.py: The main entry point containing the OpenCV segmentation logic and Gradio UI.
  • requirements.txt: List of dependencies (torch, opencv-python, transformers, etc.).
  • .gitattributes: Configuration for Git LFS to track large model files.
  • README.md: Documentation and project metadata.