| --- |
| title: Hand Written Text Recognition |
| emoji: π |
| colorFrom: indigo |
| colorTo: green |
| sdk: gradio |
| sdk_version: 6.12.0 |
| app_file: app.py |
| pinned: false |
| --- |
| |
| # βοΈ Handwritten Paragraph to Typed Text |
|
|
| This project is a robust **Handwritten Text Recognition (HTR)** pipeline that converts full paragraphs of handwriting into digital, editable text. It bridges the gap between traditional Computer Vision and modern Deep Learning by combining **OpenCV** with **Transformer-based models**. |
|
|
| ## π Live Demo |
| You can try the live application on Hugging Face Spaces: [https://prabhatgupta786-hand-written-text-recognition.hf.space] |
|
|
| --- |
|
|
| ## π οΈ Technical Architecture |
|
|
| The system follows a three-stage pipeline to ensure high accuracy on multi-line text: |
|
|
| ### 1. Image Pre-processing (OpenCV) |
| Standard OCR models perform best on single lines. This project uses a custom pre-processing engine to handle full paragraphs: |
| * **Thresholding:** Converts images to binary (B&W) to isolate ink from paper. |
| * **Morphological Dilation:** Uses a horizontal kernel `(5, 100)` to "smear" letters into line-level blobs. |
| * **Contour Detection:** Identifies these blobs as individual lines and segments them. |
|
|
| ### 2. Deep Learning Inference (TrOCR) |
| The segmented lines are processed using **TrOCR (Transformer-based Optical Character Recognition)**: |
| * **Encoder:** A **Vision Transformer (ViT)** that processes the image patches. |
| * **Decoder:** A **RoBERTa** language model that generates text based on visual features and linguistic context. |
| * **Framework:** Powered by **Hugging Face Transformers** and **PyTorch**. |
|
|
| ### 3. User Interface (Gradio) |
| The logic is wrapped in a **Gradio** web interface, allowing users to upload images and receive text outputs in real-time. |
|
|
| --- |
|
|
| ## π§° Tech Stack |
|
|
| * **Language:** Python |
| * **Computer Vision:** OpenCV, NumPy, Pillow |
| * **Deep Learning:** PyTorch, Transformers (TrOCR-Large) |
| * **Deployment:** Gradio, Hugging Face Spaces |
| * **Version Control:** Git LFS (Large File Storage for model binaries) |
|
|
| --- |
|
|
| ## π Project Structure |
|
|
| * `app.py`: The main entry point containing the OpenCV segmentation logic and Gradio UI. |
| * `requirements.txt`: List of dependencies (torch, opencv-python, transformers, etc.). |
| * `.gitattributes`: Configuration for Git LFS to track large model files. |
| * `README.md`: Documentation and project metadata. |
|
|
| --- |
|
|