--- title: Hand2Voice emoji: ðŸĪŸ colorFrom: gray colorTo: indigo sdk: streamlit pinned: false short_description: 'Converting Hand Gestures into Speech using Computer Vision' --- --- # ðŸĪŸ Hand2Voice: AI Sign Language Assistant [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://streamlit.io) [![Python](https://img.shields.io/badge/Python-3.9%2B-blue)](https://www.python.org/) [![MediaPipe](https://img.shields.io/badge/MediaPipe-Vision-orange)](https://developers.google.com/mediapipe) **Hand2Voice** is an accessibility tool designed to bridge the communication gap for the speech-impaired. It uses computer vision to translate hand gestures into spoken audio in real-time. Unlike basic classifiers that rely on static screen coordinates, this project utilizes **Euclidean geometry** to calculate relative finger positions, ensuring accurate detection even if the hand is rotated or tilted. ## 🚀 Key Features * **📷 Dual Input Modes:** Supports real-time camera capture and image uploads. * **🧠 Robust Recognition Logic:** Uses 3D Euclidean distance calculations relative to the wrist, making detection rotation-invariant. * **ðŸĶī Skeletal Visualization:** Real-time feedback overlay showing exactly what the computer vision model "sees." * **ðŸ—Ģïļ Text-to-Speech (TTS):** Instantly vocalizes the detected gesture using Google TTS. * **📂 JSON-Based Rule Engine:** Gestures are defined in an external `gesture_rules.json` file, making it easy to add new signs without changing code. ## 🛠ïļ Tech Stack * **Frontend:** [Streamlit](https://streamlit.io/) (Web Interface) * **Computer Vision:** [MediaPipe Hands](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker) (Google) * **Image Processing:** OpenCV & NumPy * **Audio:** gTTS (Google Text-to-Speech) ## 📂 Project Structure ```bash Hand2Voice/ ├── app.py # Main Streamlit application (UI & Logic) ├── gesture_classifier.py # Advanced logic using Euclidean distance ├── gesture_rules.json # Database of supported gestures ├── tts.py # Text-to-Speech helper function ├── requirements.txt # List of python dependencies ├── NIELIT-LOGO.png # Institution Logo └── README.md # Project Documentation ``` ## ðŸ’ŋ Installation & Setup 1. **Clone the Repository** ```bash git clone [https://github.com/imarshbir/Hand2Voice.git](https://github.com/imarshbir/Hand2Voice.git) cd Hand2Voice ``` 2. **Create a Virtual Environment (Optional but Recommended)** ```bash python -m venv venv # Windows venv\Scripts\activate # Mac/Linux source venv/bin/activate ``` 3. **Install Dependencies** ```bash pip install -r requirements.txt ``` 4. **Run the Application** ```bash streamlit run app.py ``` ## ðŸĪŸ Supported Gestures The current version supports the following gestures (defined in `gesture_rules.json`): | Gesture | Description | | --- | --- | | **HELLO** | Open palm (All fingers extended) | | **YES / POINT** | Index finger raised | | **NO** | Closed fist | | **PEACE** | Index & Middle fingers raised (V-sign) | | **OK** | Thumb & Index touching, others extended | | **ROCK ON** | Index & Pinky extended | | **THUMBS UP** | Thumb extended only | ## 🔎 How It Works (Technical Deep Dive) Most basic hand gesture tutorials use simple `if y_tip < y_knuckle` logic. This fails if the hand is tilted sideways. **Hand2Voice** solves this by using **Vector Math**: 1. **Landmark Extraction:** MediaPipe extracts 21 3D landmarks () for the hand. 2. **Distance Calculation:** We calculate the Euclidean distance between each fingertip and the **Wrist (Landmark 0)**. $$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2} $$ 3. **State Determination:** * If , the finger is considered **OPEN**. * Otherwise, it is **CLOSED**. 4. **Pattern Matching:** The resulting binary array (e.g., `[0, 1, 1, 0, 0]`) is compared against the definitions in `gesture_rules.json`. ## ðŸ”Ū Future Scope * **Real-time Video Stream:** Integration with `streamlit-webrtc` for continuous streaming without clicking "capture." * **Dynamic Gestures:** Support for moving gestures (like waving) using LSTM networks. * **Multi-Language Support:** Adding Hindi/Punjabi TTS output. ## ðŸ‘Ļ‍ðŸ’ŧ Author **Lovnish Verma** * @lovnishverma * Expertise: AI, Computer Vision, IoT * AIML Researcher