Spaces:

LovnishVerma
/

Hand2Voice

Build error

App Files Files Community

Hand2Voice / README.md

LovnishVerma

Update README.md

1ec0e1b verified 4 months ago

preview code

raw

history blame contribute delete

4.44 kB

	---
	title: Hand2Voice
	emoji: 🤟
	colorFrom: gray
	colorTo: indigo
	sdk: streamlit
	pinned: false
	short_description: 'Converting Hand Gestures into Speech using Computer Vision'
	---

	---

	# 🤟 Hand2Voice: AI Sign Language Assistant

	[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://streamlit.io)
	[![Python](https://img.shields.io/badge/Python-3.9%2B-blue)](https://www.python.org/)
	[![MediaPipe](https://img.shields.io/badge/MediaPipe-Vision-orange)](https://developers.google.com/mediapipe)

	Hand2Voice is an accessibility tool designed to bridge the communication gap for the speech-impaired. It uses computer vision to translate hand gestures into spoken audio in real-time.

	Unlike basic classifiers that rely on static screen coordinates, this project utilizes Euclidean geometry to calculate relative finger positions, ensuring accurate detection even if the hand is rotated or tilted.

	## 🚀 Key Features

	* 📷 Dual Input Modes: Supports real-time camera capture and image uploads.
	* 🧠 Robust Recognition Logic: Uses 3D Euclidean distance calculations relative to the wrist, making detection rotation-invariant.
	* 🦴 Skeletal Visualization: Real-time feedback overlay showing exactly what the computer vision model "sees."
	* 🗣️ Text-to-Speech (TTS): Instantly vocalizes the detected gesture using Google TTS.
	* 📂 JSON-Based Rule Engine: Gestures are defined in an external `gesture_rules.json` file, making it easy to add new signs without changing code.

	## 🛠️ Tech Stack

	* Frontend: [Streamlit](https://streamlit.io/) (Web Interface)
	* Computer Vision: [MediaPipe Hands](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker) (Google)
	* Image Processing: OpenCV & NumPy
	* Audio: gTTS (Google Text-to-Speech)

	## 📂 Project Structure

	```bash
	Hand2Voice/
	├── app.py # Main Streamlit application (UI & Logic)
	├── gesture_classifier.py # Advanced logic using Euclidean distance
	├── gesture_rules.json # Database of supported gestures
	├── tts.py # Text-to-Speech helper function
	├── requirements.txt # List of python dependencies
	├── NIELIT-LOGO.png # Institution Logo
	└── README.md # Project Documentation

	```

	## 💿 Installation & Setup

	1. Clone the Repository
	```bash
	git clone [https://github.com/imarshbir/Hand2Voice.git](https://github.com/imarshbir/Hand2Voice.git)
	cd Hand2Voice

	```


	2. Create a Virtual Environment (Optional but Recommended)
	```bash
	python -m venv venv
	# Windows
	venv\Scripts\activate
	# Mac/Linux
	source venv/bin/activate

	```


	3. Install Dependencies
	```bash
	pip install -r requirements.txt

	```


	4. Run the Application
	```bash
	streamlit run app.py

	```



	## 🤟 Supported Gestures

	The current version supports the following gestures (defined in `gesture_rules.json`):

	\| Gesture \| Description \|
	\| --- \| --- \|
	\| HELLO \| Open palm (All fingers extended) \|
	\| YES / POINT \| Index finger raised \|
	\| NO \| Closed fist \|
	\| PEACE \| Index & Middle fingers raised (V-sign) \|
	\| OK \| Thumb & Index touching, others extended \|
	\| ROCK ON \| Index & Pinky extended \|
	\| THUMBS UP \| Thumb extended only \|

	## 🔬 How It Works (Technical Deep Dive)

	Most basic hand gesture tutorials use simple `if y_tip < y_knuckle` logic. This fails if the hand is tilted sideways.

	Hand2Voice solves this by using Vector Math:

	1. Landmark Extraction: MediaPipe extracts 21 3D landmarks () for the hand.
	2. Distance Calculation: We calculate the Euclidean distance between each fingertip and the Wrist (Landmark 0).
	$$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2} $$
	3. State Determination:
	* If , the finger is considered OPEN.
	* Otherwise, it is CLOSED.


	4. Pattern Matching: The resulting binary array (e.g., `[0, 1, 1, 0, 0]`) is compared against the definitions in `gesture_rules.json`.

	## 🔮 Future Scope

	* Real-time Video Stream: Integration with `streamlit-webrtc` for continuous streaming without clicking "capture."
	* Dynamic Gestures: Support for moving gestures (like waving) using LSTM networks.
	* Multi-Language Support: Adding Hindi/Punjabi TTS output.

	## 👨‍💻 Author

	Lovnish Verma

	* @lovnishverma

	* Expertise: AI, Computer Vision, IoT
	* AIML Researcher