Hand2Voice / README.md
LovnishVerma's picture
Update README.md
1ec0e1b verified
---
title: Hand2Voice
emoji: ๐ŸคŸ
colorFrom: gray
colorTo: indigo
sdk: streamlit
pinned: false
short_description: 'Converting Hand Gestures into Speech using Computer Vision'
---
---
# ๐ŸคŸ Hand2Voice: AI Sign Language Assistant
[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://streamlit.io)
[![Python](https://img.shields.io/badge/Python-3.9%2B-blue)](https://www.python.org/)
[![MediaPipe](https://img.shields.io/badge/MediaPipe-Vision-orange)](https://developers.google.com/mediapipe)
**Hand2Voice** is an accessibility tool designed to bridge the communication gap for the speech-impaired. It uses computer vision to translate hand gestures into spoken audio in real-time.
Unlike basic classifiers that rely on static screen coordinates, this project utilizes **Euclidean geometry** to calculate relative finger positions, ensuring accurate detection even if the hand is rotated or tilted.
## ๐Ÿš€ Key Features
* **๐Ÿ“ท Dual Input Modes:** Supports real-time camera capture and image uploads.
* **๐Ÿง  Robust Recognition Logic:** Uses 3D Euclidean distance calculations relative to the wrist, making detection rotation-invariant.
* **๐Ÿฆด Skeletal Visualization:** Real-time feedback overlay showing exactly what the computer vision model "sees."
* **๐Ÿ—ฃ๏ธ Text-to-Speech (TTS):** Instantly vocalizes the detected gesture using Google TTS.
* **๐Ÿ“‚ JSON-Based Rule Engine:** Gestures are defined in an external `gesture_rules.json` file, making it easy to add new signs without changing code.
## ๐Ÿ› ๏ธ Tech Stack
* **Frontend:** [Streamlit](https://streamlit.io/) (Web Interface)
* **Computer Vision:** [MediaPipe Hands](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker) (Google)
* **Image Processing:** OpenCV & NumPy
* **Audio:** gTTS (Google Text-to-Speech)
## ๐Ÿ“‚ Project Structure
```bash
Hand2Voice/
โ”œโ”€โ”€ app.py # Main Streamlit application (UI & Logic)
โ”œโ”€โ”€ gesture_classifier.py # Advanced logic using Euclidean distance
โ”œโ”€โ”€ gesture_rules.json # Database of supported gestures
โ”œโ”€โ”€ tts.py # Text-to-Speech helper function
โ”œโ”€โ”€ requirements.txt # List of python dependencies
โ”œโ”€โ”€ NIELIT-LOGO.png # Institution Logo
โ””โ”€โ”€ README.md # Project Documentation
```
## ๐Ÿ’ฟ Installation & Setup
1. **Clone the Repository**
```bash
git clone [https://github.com/imarshbir/Hand2Voice.git](https://github.com/imarshbir/Hand2Voice.git)
cd Hand2Voice
```
2. **Create a Virtual Environment (Optional but Recommended)**
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activate
```
3. **Install Dependencies**
```bash
pip install -r requirements.txt
```
4. **Run the Application**
```bash
streamlit run app.py
```
## ๐ŸคŸ Supported Gestures
The current version supports the following gestures (defined in `gesture_rules.json`):
| Gesture | Description |
| --- | --- |
| **HELLO** | Open palm (All fingers extended) |
| **YES / POINT** | Index finger raised |
| **NO** | Closed fist |
| **PEACE** | Index & Middle fingers raised (V-sign) |
| **OK** | Thumb & Index touching, others extended |
| **ROCK ON** | Index & Pinky extended |
| **THUMBS UP** | Thumb extended only |
## ๐Ÿ”ฌ How It Works (Technical Deep Dive)
Most basic hand gesture tutorials use simple `if y_tip < y_knuckle` logic. This fails if the hand is tilted sideways.
**Hand2Voice** solves this by using **Vector Math**:
1. **Landmark Extraction:** MediaPipe extracts 21 3D landmarks () for the hand.
2. **Distance Calculation:** We calculate the Euclidean distance between each fingertip and the **Wrist (Landmark 0)**.
$$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2} $$
3. **State Determination:**
* If , the finger is considered **OPEN**.
* Otherwise, it is **CLOSED**.
4. **Pattern Matching:** The resulting binary array (e.g., `[0, 1, 1, 0, 0]`) is compared against the definitions in `gesture_rules.json`.
## ๐Ÿ”ฎ Future Scope
* **Real-time Video Stream:** Integration with `streamlit-webrtc` for continuous streaming without clicking "capture."
* **Dynamic Gestures:** Support for moving gestures (like waving) using LSTM networks.
* **Multi-Language Support:** Adding Hindi/Punjabi TTS output.
## ๐Ÿ‘จโ€๐Ÿ’ป Author
**Lovnish Verma**
* @lovnishverma
* Expertise: AI, Computer Vision, IoT
* AIML Researcher