Spaces:

PrabhatGupta786
/

Hand_written_text_recognition

Sleeping

App Files Files Community

Hand_written_text_recognition / README.md

PrabhatGupta786

Update README.md

4f87291 verified about 2 months ago

preview code

raw

history blame contribute delete

2.36 kB

	---
	title: Hand Written Text Recognition
	emoji: 📝
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	sdk_version: 6.12.0
	app_file: app.py
	pinned: false
	---

	# ✍️ Handwritten Paragraph to Typed Text

	This project is a robust Handwritten Text Recognition (HTR) pipeline that converts full paragraphs of handwriting into digital, editable text. It bridges the gap between traditional Computer Vision and modern Deep Learning by combining OpenCV with Transformer-based models.

	## 🚀 Live Demo
	You can try the live application on Hugging Face Spaces: [https://prabhatgupta786-hand-written-text-recognition.hf.space]

	---

	## 🛠️ Technical Architecture

	The system follows a three-stage pipeline to ensure high accuracy on multi-line text:

	### 1. Image Pre-processing (OpenCV)
	Standard OCR models perform best on single lines. This project uses a custom pre-processing engine to handle full paragraphs:
	* Thresholding: Converts images to binary (B&W) to isolate ink from paper.
	* Morphological Dilation: Uses a horizontal kernel `(5, 100)` to "smear" letters into line-level blobs.
	* Contour Detection: Identifies these blobs as individual lines and segments them.

	### 2. Deep Learning Inference (TrOCR)
	The segmented lines are processed using TrOCR (Transformer-based Optical Character Recognition):
	* Encoder: A Vision Transformer (ViT) that processes the image patches.
	* Decoder: A RoBERTa language model that generates text based on visual features and linguistic context.
	* Framework: Powered by Hugging Face Transformers and PyTorch.

	### 3. User Interface (Gradio)
	The logic is wrapped in a Gradio web interface, allowing users to upload images and receive text outputs in real-time.

	---

	## 🧰 Tech Stack

	* Language: Python
	* Computer Vision: OpenCV, NumPy, Pillow
	* Deep Learning: PyTorch, Transformers (TrOCR-Large)
	* Deployment: Gradio, Hugging Face Spaces
	* Version Control: Git LFS (Large File Storage for model binaries)

	---

	## 📂 Project Structure

	* `app.py`: The main entry point containing the OpenCV segmentation logic and Gradio UI.
	* `requirements.txt`: List of dependencies (torch, opencv-python, transformers, etc.).
	* `.gitattributes`: Configuration for Git LFS to track large model files.
	* `README.md`: Documentation and project metadata.

	---