Spaces:
Build error
Build error
| title: Hand2Voice | |
| emoji: ๐ค | |
| colorFrom: gray | |
| colorTo: indigo | |
| sdk: streamlit | |
| pinned: false | |
| short_description: 'Converting Hand Gestures into Speech using Computer Vision' | |
| --- | |
| # ๐ค Hand2Voice: AI Sign Language Assistant | |
| [](https://streamlit.io) | |
| [](https://www.python.org/) | |
| [](https://developers.google.com/mediapipe) | |
| **Hand2Voice** is an accessibility tool designed to bridge the communication gap for the speech-impaired. It uses computer vision to translate hand gestures into spoken audio in real-time. | |
| Unlike basic classifiers that rely on static screen coordinates, this project utilizes **Euclidean geometry** to calculate relative finger positions, ensuring accurate detection even if the hand is rotated or tilted. | |
| ## ๐ Key Features | |
| * **๐ท Dual Input Modes:** Supports real-time camera capture and image uploads. | |
| * **๐ง Robust Recognition Logic:** Uses 3D Euclidean distance calculations relative to the wrist, making detection rotation-invariant. | |
| * **๐ฆด Skeletal Visualization:** Real-time feedback overlay showing exactly what the computer vision model "sees." | |
| * **๐ฃ๏ธ Text-to-Speech (TTS):** Instantly vocalizes the detected gesture using Google TTS. | |
| * **๐ JSON-Based Rule Engine:** Gestures are defined in an external `gesture_rules.json` file, making it easy to add new signs without changing code. | |
| ## ๐ ๏ธ Tech Stack | |
| * **Frontend:** [Streamlit](https://streamlit.io/) (Web Interface) | |
| * **Computer Vision:** [MediaPipe Hands](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker) (Google) | |
| * **Image Processing:** OpenCV & NumPy | |
| * **Audio:** gTTS (Google Text-to-Speech) | |
| ## ๐ Project Structure | |
| ```bash | |
| Hand2Voice/ | |
| โโโ app.py # Main Streamlit application (UI & Logic) | |
| โโโ gesture_classifier.py # Advanced logic using Euclidean distance | |
| โโโ gesture_rules.json # Database of supported gestures | |
| โโโ tts.py # Text-to-Speech helper function | |
| โโโ requirements.txt # List of python dependencies | |
| โโโ NIELIT-LOGO.png # Institution Logo | |
| โโโ README.md # Project Documentation | |
| ``` | |
| ## ๐ฟ Installation & Setup | |
| 1. **Clone the Repository** | |
| ```bash | |
| git clone [https://github.com/imarshbir/Hand2Voice.git](https://github.com/imarshbir/Hand2Voice.git) | |
| cd Hand2Voice | |
| ``` | |
| 2. **Create a Virtual Environment (Optional but Recommended)** | |
| ```bash | |
| python -m venv venv | |
| # Windows | |
| venv\Scripts\activate | |
| # Mac/Linux | |
| source venv/bin/activate | |
| ``` | |
| 3. **Install Dependencies** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 4. **Run the Application** | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| ## ๐ค Supported Gestures | |
| The current version supports the following gestures (defined in `gesture_rules.json`): | |
| | Gesture | Description | | |
| | --- | --- | | |
| | **HELLO** | Open palm (All fingers extended) | | |
| | **YES / POINT** | Index finger raised | | |
| | **NO** | Closed fist | | |
| | **PEACE** | Index & Middle fingers raised (V-sign) | | |
| | **OK** | Thumb & Index touching, others extended | | |
| | **ROCK ON** | Index & Pinky extended | | |
| | **THUMBS UP** | Thumb extended only | | |
| ## ๐ฌ How It Works (Technical Deep Dive) | |
| Most basic hand gesture tutorials use simple `if y_tip < y_knuckle` logic. This fails if the hand is tilted sideways. | |
| **Hand2Voice** solves this by using **Vector Math**: | |
| 1. **Landmark Extraction:** MediaPipe extracts 21 3D landmarks () for the hand. | |
| 2. **Distance Calculation:** We calculate the Euclidean distance between each fingertip and the **Wrist (Landmark 0)**. | |
| $$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2} $$ | |
| 3. **State Determination:** | |
| * If , the finger is considered **OPEN**. | |
| * Otherwise, it is **CLOSED**. | |
| 4. **Pattern Matching:** The resulting binary array (e.g., `[0, 1, 1, 0, 0]`) is compared against the definitions in `gesture_rules.json`. | |
| ## ๐ฎ Future Scope | |
| * **Real-time Video Stream:** Integration with `streamlit-webrtc` for continuous streaming without clicking "capture." | |
| * **Dynamic Gestures:** Support for moving gestures (like waving) using LSTM networks. | |
| * **Multi-Language Support:** Adding Hindi/Punjabi TTS output. | |
| ## ๐จโ๐ป Author | |
| **Lovnish Verma** | |
| * @lovnishverma | |
| * Expertise: AI, Computer Vision, IoT | |
| * AIML Researcher | |