Spaces:
Build error
Build error
File size: 4,443 Bytes
8855546 66722db 8855546 66722db 8855546 66722db a07b629 7de259d a07b629 1ec0e1b a07b629 1ec0e1b 7de259d a07b629 1ec0e1b a07b629 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | ---
title: Hand2Voice
emoji: ๐ค
colorFrom: gray
colorTo: indigo
sdk: streamlit
pinned: false
short_description: 'Converting Hand Gestures into Speech using Computer Vision'
---
---
# ๐ค Hand2Voice: AI Sign Language Assistant
[](https://streamlit.io)
[](https://www.python.org/)
[](https://developers.google.com/mediapipe)
**Hand2Voice** is an accessibility tool designed to bridge the communication gap for the speech-impaired. It uses computer vision to translate hand gestures into spoken audio in real-time.
Unlike basic classifiers that rely on static screen coordinates, this project utilizes **Euclidean geometry** to calculate relative finger positions, ensuring accurate detection even if the hand is rotated or tilted.
## ๐ Key Features
* **๐ท Dual Input Modes:** Supports real-time camera capture and image uploads.
* **๐ง Robust Recognition Logic:** Uses 3D Euclidean distance calculations relative to the wrist, making detection rotation-invariant.
* **๐ฆด Skeletal Visualization:** Real-time feedback overlay showing exactly what the computer vision model "sees."
* **๐ฃ๏ธ Text-to-Speech (TTS):** Instantly vocalizes the detected gesture using Google TTS.
* **๐ JSON-Based Rule Engine:** Gestures are defined in an external `gesture_rules.json` file, making it easy to add new signs without changing code.
## ๐ ๏ธ Tech Stack
* **Frontend:** [Streamlit](https://streamlit.io/) (Web Interface)
* **Computer Vision:** [MediaPipe Hands](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker) (Google)
* **Image Processing:** OpenCV & NumPy
* **Audio:** gTTS (Google Text-to-Speech)
## ๐ Project Structure
```bash
Hand2Voice/
โโโ app.py # Main Streamlit application (UI & Logic)
โโโ gesture_classifier.py # Advanced logic using Euclidean distance
โโโ gesture_rules.json # Database of supported gestures
โโโ tts.py # Text-to-Speech helper function
โโโ requirements.txt # List of python dependencies
โโโ NIELIT-LOGO.png # Institution Logo
โโโ README.md # Project Documentation
```
## ๐ฟ Installation & Setup
1. **Clone the Repository**
```bash
git clone [https://github.com/imarshbir/Hand2Voice.git](https://github.com/imarshbir/Hand2Voice.git)
cd Hand2Voice
```
2. **Create a Virtual Environment (Optional but Recommended)**
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activate
```
3. **Install Dependencies**
```bash
pip install -r requirements.txt
```
4. **Run the Application**
```bash
streamlit run app.py
```
## ๐ค Supported Gestures
The current version supports the following gestures (defined in `gesture_rules.json`):
| Gesture | Description |
| --- | --- |
| **HELLO** | Open palm (All fingers extended) |
| **YES / POINT** | Index finger raised |
| **NO** | Closed fist |
| **PEACE** | Index & Middle fingers raised (V-sign) |
| **OK** | Thumb & Index touching, others extended |
| **ROCK ON** | Index & Pinky extended |
| **THUMBS UP** | Thumb extended only |
## ๐ฌ How It Works (Technical Deep Dive)
Most basic hand gesture tutorials use simple `if y_tip < y_knuckle` logic. This fails if the hand is tilted sideways.
**Hand2Voice** solves this by using **Vector Math**:
1. **Landmark Extraction:** MediaPipe extracts 21 3D landmarks () for the hand.
2. **Distance Calculation:** We calculate the Euclidean distance between each fingertip and the **Wrist (Landmark 0)**.
$$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2} $$
3. **State Determination:**
* If , the finger is considered **OPEN**.
* Otherwise, it is **CLOSED**.
4. **Pattern Matching:** The resulting binary array (e.g., `[0, 1, 1, 0, 0]`) is compared against the definitions in `gesture_rules.json`.
## ๐ฎ Future Scope
* **Real-time Video Stream:** Integration with `streamlit-webrtc` for continuous streaming without clicking "capture."
* **Dynamic Gestures:** Support for moving gestures (like waving) using LSTM networks.
* **Multi-Language Support:** Adding Hindi/Punjabi TTS output.
## ๐จโ๐ป Author
**Lovnish Verma**
* @lovnishverma
* Expertise: AI, Computer Vision, IoT
* AIML Researcher
|