Spaces:

LovnishVerma
/

Hand2Voice

Build error

File size: 4,443 Bytes

---
title: Hand2Voice
emoji: 🤟
colorFrom: gray
colorTo: indigo
sdk: streamlit
pinned: false
short_description: 'Converting Hand Gestures into Speech using Computer Vision'
---

---

# 🤟 Hand2Voice: AI Sign Language Assistant

[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://streamlit.io)
[![Python](https://img.shields.io/badge/Python-3.9%2B-blue)](https://www.python.org/)
[![MediaPipe](https://img.shields.io/badge/MediaPipe-Vision-orange)](https://developers.google.com/mediapipe)

**Hand2Voice** is an accessibility tool designed to bridge the communication gap for the speech-impaired. It uses computer vision to translate hand gestures into spoken audio in real-time.

Unlike basic classifiers that rely on static screen coordinates, this project utilizes **Euclidean geometry** to calculate relative finger positions, ensuring accurate detection even if the hand is rotated or tilted.

## 🚀 Key Features

* **📷 Dual Input Modes:** Supports real-time camera capture and image uploads.
* **🧠 Robust Recognition Logic:** Uses 3D Euclidean distance calculations relative to the wrist, making detection rotation-invariant.
* **🦴 Skeletal Visualization:** Real-time feedback overlay showing exactly what the computer vision model "sees."
* **🗣️ Text-to-Speech (TTS):** Instantly vocalizes the detected gesture using Google TTS.
* **📂 JSON-Based Rule Engine:** Gestures are defined in an external `gesture_rules.json` file, making it easy to add new signs without changing code.

## 🛠️ Tech Stack

* **Frontend:** [Streamlit](https://streamlit.io/) (Web Interface)
* **Computer Vision:** [MediaPipe Hands](https://developers.google.com/mediapipe/solutions/vision/hand_landmarker) (Google)
* **Image Processing:** OpenCV & NumPy
* **Audio:** gTTS (Google Text-to-Speech)

## 📂 Project Structure

```bash
Hand2Voice/
├── app.py                 # Main Streamlit application (UI & Logic)
├── gesture_classifier.py  # Advanced logic using Euclidean distance
├── gesture_rules.json     # Database of supported gestures
├── tts.py                 # Text-to-Speech helper function
├── requirements.txt       # List of python dependencies
├── NIELIT-LOGO.png        # Institution Logo
└── README.md              # Project Documentation

```

## 💿 Installation & Setup

1. **Clone the Repository**
```bash
git clone [https://github.com/imarshbir/Hand2Voice.git](https://github.com/imarshbir/Hand2Voice.git)
cd Hand2Voice

```


2. **Create a Virtual Environment (Optional but Recommended)**
```bash
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activate

```


3. **Install Dependencies**
```bash
pip install -r requirements.txt

```


4. **Run the Application**
```bash
streamlit run app.py

```



## 🤟 Supported Gestures

The current version supports the following gestures (defined in `gesture_rules.json`):

| Gesture | Description |
| --- | --- |
| **HELLO** | Open palm (All fingers extended) |
| **YES / POINT** | Index finger raised |
| **NO** | Closed fist |
| **PEACE** | Index & Middle fingers raised (V-sign) |
| **OK** | Thumb & Index touching, others extended |
| **ROCK ON** | Index & Pinky extended |
| **THUMBS UP** | Thumb extended only |

## 🔬 How It Works (Technical Deep Dive)

Most basic hand gesture tutorials use simple `if y_tip < y_knuckle` logic. This fails if the hand is tilted sideways.

**Hand2Voice** solves this by using **Vector Math**:

1. **Landmark Extraction:** MediaPipe extracts 21 3D landmarks () for the hand.
2. **Distance Calculation:** We calculate the Euclidean distance between each fingertip and the **Wrist (Landmark 0)**.
$$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2} $$
3. **State Determination:**
* If , the finger is considered **OPEN**.
* Otherwise, it is **CLOSED**.


4. **Pattern Matching:** The resulting binary array (e.g., `[0, 1, 1, 0, 0]`) is compared against the definitions in `gesture_rules.json`.

## 🔮 Future Scope

* **Real-time Video Stream:** Integration with `streamlit-webrtc` for continuous streaming without clicking "capture."
* **Dynamic Gestures:** Support for moving gestures (like waving) using LSTM networks.
* **Multi-Language Support:** Adding Hindi/Punjabi TTS output.

## 👨‍💻 Author

**Lovnish Verma**

* @lovnishverma

* Expertise: AI, Computer Vision, IoT
* AIML Researcher