A newer version of the Gradio SDK is available: 6.16.0
title: Hand Written Text Recognition
emoji: π
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.12.0
app_file: app.py
pinned: false
βοΈ Handwritten Paragraph to Typed Text
This project is a robust Handwritten Text Recognition (HTR) pipeline that converts full paragraphs of handwriting into digital, editable text. It bridges the gap between traditional Computer Vision and modern Deep Learning by combining OpenCV with Transformer-based models.
π Live Demo
You can try the live application on Hugging Face Spaces: [https://prabhatgupta786-hand-written-text-recognition.hf.space]
π οΈ Technical Architecture
The system follows a three-stage pipeline to ensure high accuracy on multi-line text:
1. Image Pre-processing (OpenCV)
Standard OCR models perform best on single lines. This project uses a custom pre-processing engine to handle full paragraphs:
- Thresholding: Converts images to binary (B&W) to isolate ink from paper.
- Morphological Dilation: Uses a horizontal kernel
(5, 100)to "smear" letters into line-level blobs. - Contour Detection: Identifies these blobs as individual lines and segments them.
2. Deep Learning Inference (TrOCR)
The segmented lines are processed using TrOCR (Transformer-based Optical Character Recognition):
- Encoder: A Vision Transformer (ViT) that processes the image patches.
- Decoder: A RoBERTa language model that generates text based on visual features and linguistic context.
- Framework: Powered by Hugging Face Transformers and PyTorch.
3. User Interface (Gradio)
The logic is wrapped in a Gradio web interface, allowing users to upload images and receive text outputs in real-time.
π§° Tech Stack
- Language: Python
- Computer Vision: OpenCV, NumPy, Pillow
- Deep Learning: PyTorch, Transformers (TrOCR-Large)
- Deployment: Gradio, Hugging Face Spaces
- Version Control: Git LFS (Large File Storage for model binaries)
π Project Structure
app.py: The main entry point containing the OpenCV segmentation logic and Gradio UI.requirements.txt: List of dependencies (torch, opencv-python, transformers, etc.)..gitattributes: Configuration for Git LFS to track large model files.README.md: Documentation and project metadata.