PrabhatGupta786's picture
Update README.md
4f87291 verified
---
title: Hand Written Text Recognition
emoji: πŸ“
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.12.0
app_file: app.py
pinned: false
---
# ✍️ Handwritten Paragraph to Typed Text
This project is a robust **Handwritten Text Recognition (HTR)** pipeline that converts full paragraphs of handwriting into digital, editable text. It bridges the gap between traditional Computer Vision and modern Deep Learning by combining **OpenCV** with **Transformer-based models**.
## πŸš€ Live Demo
You can try the live application on Hugging Face Spaces: [https://prabhatgupta786-hand-written-text-recognition.hf.space]
---
## πŸ› οΈ Technical Architecture
The system follows a three-stage pipeline to ensure high accuracy on multi-line text:
### 1. Image Pre-processing (OpenCV)
Standard OCR models perform best on single lines. This project uses a custom pre-processing engine to handle full paragraphs:
* **Thresholding:** Converts images to binary (B&W) to isolate ink from paper.
* **Morphological Dilation:** Uses a horizontal kernel `(5, 100)` to "smear" letters into line-level blobs.
* **Contour Detection:** Identifies these blobs as individual lines and segments them.
### 2. Deep Learning Inference (TrOCR)
The segmented lines are processed using **TrOCR (Transformer-based Optical Character Recognition)**:
* **Encoder:** A **Vision Transformer (ViT)** that processes the image patches.
* **Decoder:** A **RoBERTa** language model that generates text based on visual features and linguistic context.
* **Framework:** Powered by **Hugging Face Transformers** and **PyTorch**.
### 3. User Interface (Gradio)
The logic is wrapped in a **Gradio** web interface, allowing users to upload images and receive text outputs in real-time.
---
## 🧰 Tech Stack
* **Language:** Python
* **Computer Vision:** OpenCV, NumPy, Pillow
* **Deep Learning:** PyTorch, Transformers (TrOCR-Large)
* **Deployment:** Gradio, Hugging Face Spaces
* **Version Control:** Git LFS (Large File Storage for model binaries)
---
## πŸ“‚ Project Structure
* `app.py`: The main entry point containing the OpenCV segmentation logic and Gradio UI.
* `requirements.txt`: List of dependencies (torch, opencv-python, transformers, etc.).
* `.gitattributes`: Configuration for Git LFS to track large model files.
* `README.md`: Documentation and project metadata.
---