Spaces:

PrabhatGupta786
/

Hand_written_text_recognition

Sleeping

App Files Files Community

Hand_written_text_recognition / README.md

PrabhatGupta786

Update README.md

4f87291 verified about 2 months ago

preview code

raw

history blame contribute delete

2.36 kB

A newer version of the Gradio SDK is available: 6.16.0

Upgrade

metadata

title: Hand Written Text Recognition
emoji: 📝
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.12.0
app_file: app.py
pinned: false

✍️ Handwritten Paragraph to Typed Text

This project is a robust Handwritten Text Recognition (HTR) pipeline that converts full paragraphs of handwriting into digital, editable text. It bridges the gap between traditional Computer Vision and modern Deep Learning by combining OpenCV with Transformer-based models.

🚀 Live Demo

You can try the live application on Hugging Face Spaces: [https://prabhatgupta786-hand-written-text-recognition.hf.space]

🛠️ Technical Architecture

The system follows a three-stage pipeline to ensure high accuracy on multi-line text:

1. Image Pre-processing (OpenCV)

Standard OCR models perform best on single lines. This project uses a custom pre-processing engine to handle full paragraphs:

Thresholding: Converts images to binary (B&W) to isolate ink from paper.
Morphological Dilation: Uses a horizontal kernel (5, 100) to "smear" letters into line-level blobs.
Contour Detection: Identifies these blobs as individual lines and segments them.

2. Deep Learning Inference (TrOCR)

The segmented lines are processed using TrOCR (Transformer-based Optical Character Recognition):

Encoder: A Vision Transformer (ViT) that processes the image patches.
Decoder: A RoBERTa language model that generates text based on visual features and linguistic context.
Framework: Powered by Hugging Face Transformers and PyTorch.

3. User Interface (Gradio)

The logic is wrapped in a Gradio web interface, allowing users to upload images and receive text outputs in real-time.

🧰 Tech Stack

Language: Python
Computer Vision: OpenCV, NumPy, Pillow
Deep Learning: PyTorch, Transformers (TrOCR-Large)
Deployment: Gradio, Hugging Face Spaces
Version Control: Git LFS (Large File Storage for model binaries)

📂 Project Structure

app.py: The main entry point containing the OpenCV segmentation logic and Gradio UI.
requirements.txt: List of dependencies (torch, opencv-python, transformers, etc.).
.gitattributes: Configuration for Git LFS to track large model files.
README.md: Documentation and project metadata.