File size: 4,443 Bytes
8855546
 
66722db
8855546
 
66722db
8855546
66722db
a07b629
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7de259d
a07b629
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1ec0e1b
a07b629
1ec0e1b
7de259d
a07b629
1ec0e1b
a07b629
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: Hand2Voice
emoji: ๐ŸคŸ
colorFrom: gray
colorTo: indigo
sdk: streamlit
pinned: false
short_description: 'Converting Hand Gestures into Speech using Computer Vision'
---

---

# ๐ŸคŸ Hand2Voice: AI Sign Language Assistant

[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://streamlit.io)
[![Python](https://img.shields.io/badge/Python-3.9%2B-blue)](https://www.python.org/)
[![MediaPipe](https://img.shields.io/badge/MediaPipe-Vision-orange)](https://developers.google.com/mediapipe)

**Hand2Voice** is an accessibility tool designed to bridge the communication gap for the speech-impaired. It uses computer vision to translate hand gestures into spoken audio in real-time.

Unlike basic classifiers that rely on static screen coordinates, this project utilizes **Euclidean geometry** to calculate relative finger positions, ensuring accurate detection even if the hand is rotated or tilted.

## ๐Ÿš€ Key Features

* **๐Ÿ“ท Dual Input Modes:** Supports real-time camera capture and image uploads.
* **๐Ÿง  Robust Recognition Logic:** Uses 3D Euclidean distance calculations relative to the wrist, making detection rotation-invariant.
* **๐Ÿฆด Skeletal Visualization:** Real-time feedback overlay showing exactly what the computer vision model "sees."
* **๐Ÿ—ฃ๏ธ Text-to-Speech (TTS):** Instantly vocalizes the detected gesture using Google TTS.
* **๐Ÿ“‚ JSON-Based Rule Engine:** Gestures are defined in an external `gesture_rules.json` file, making it easy to add new signs without changing code.

## ๐Ÿ› ๏ธ Tech Stack

* **Frontend:** [Streamlit](https://streamlit.io/) (Web Interface)
* **Computer Vision:** [MediaPipe Hands](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker) (Google)
* **Image Processing:** OpenCV & NumPy
* **Audio:** gTTS (Google Text-to-Speech)

## ๐Ÿ“‚ Project Structure

```bash
Hand2Voice/
โ”œโ”€โ”€ app.py                 # Main Streamlit application (UI & Logic)
โ”œโ”€โ”€ gesture_classifier.py  # Advanced logic using Euclidean distance
โ”œโ”€โ”€ gesture_rules.json     # Database of supported gestures
โ”œโ”€โ”€ tts.py                 # Text-to-Speech helper function
โ”œโ”€โ”€ requirements.txt       # List of python dependencies
โ”œโ”€โ”€ NIELIT-LOGO.png        # Institution Logo
โ””โ”€โ”€ README.md              # Project Documentation

```

## ๐Ÿ’ฟ Installation & Setup

1. **Clone the Repository**
```bash
git clone [https://github.com/imarshbir/Hand2Voice.git](https://github.com/imarshbir/Hand2Voice.git)
cd Hand2Voice

```


2. **Create a Virtual Environment (Optional but Recommended)**
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activate

```


3. **Install Dependencies**
```bash
pip install -r requirements.txt

```


4. **Run the Application**
```bash
streamlit run app.py

```



## ๐ŸคŸ Supported Gestures

The current version supports the following gestures (defined in `gesture_rules.json`):

| Gesture | Description |
| --- | --- |
| **HELLO** | Open palm (All fingers extended) |
| **YES / POINT** | Index finger raised |
| **NO** | Closed fist |
| **PEACE** | Index & Middle fingers raised (V-sign) |
| **OK** | Thumb & Index touching, others extended |
| **ROCK ON** | Index & Pinky extended |
| **THUMBS UP** | Thumb extended only |

## ๐Ÿ”ฌ How It Works (Technical Deep Dive)

Most basic hand gesture tutorials use simple `if y_tip < y_knuckle` logic. This fails if the hand is tilted sideways.

**Hand2Voice** solves this by using **Vector Math**:

1. **Landmark Extraction:** MediaPipe extracts 21 3D landmarks () for the hand.
2. **Distance Calculation:** We calculate the Euclidean distance between each fingertip and the **Wrist (Landmark 0)**.
$$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2} $$
3. **State Determination:**
* If , the finger is considered **OPEN**.
* Otherwise, it is **CLOSED**.


4. **Pattern Matching:** The resulting binary array (e.g., `[0, 1, 1, 0, 0]`) is compared against the definitions in `gesture_rules.json`.

## ๐Ÿ”ฎ Future Scope

* **Real-time Video Stream:** Integration with `streamlit-webrtc` for continuous streaming without clicking "capture."
* **Dynamic Gestures:** Support for moving gestures (like waving) using LSTM networks.
* **Multi-Language Support:** Adding Hindi/Punjabi TTS output.

## ๐Ÿ‘จโ€๐Ÿ’ป Author

**Lovnish Verma**

* @lovnishverma

* Expertise: AI, Computer Vision, IoT
* AIML Researcher