Hand2Voice / README.md
LovnishVerma's picture
Update README.md
7de259d verified

A newer version of the Streamlit SDK is available: 1.52.2

Upgrade
metadata
title: Hand2Voice
emoji: ๐ŸคŸ
colorFrom: gray
colorTo: indigo
sdk: streamlit
pinned: false
short_description: Converting Hand Gestures into Speech using Computer Vision

๐ŸคŸ Hand2Voice: AI Sign Language Assistant

Streamlit App Python MediaPipe

Hand2Voice is an accessibility tool designed to bridge the communication gap for the speech-impaired. It uses computer vision to translate hand gestures into spoken audio in real-time.

Unlike basic classifiers that rely on static screen coordinates, this project utilizes Euclidean geometry to calculate relative finger positions, ensuring accurate detection even if the hand is rotated or tilted.

๐Ÿš€ Key Features

  • ๐Ÿ“ท Dual Input Modes: Supports real-time camera capture and image uploads.
  • ๐Ÿง  Robust Recognition Logic: Uses 3D Euclidean distance calculations relative to the wrist, making detection rotation-invariant.
  • ๐Ÿฆด Skeletal Visualization: Real-time feedback overlay showing exactly what the computer vision model "sees."
  • ๐Ÿ—ฃ๏ธ Text-to-Speech (TTS): Instantly vocalizes the detected gesture using Google TTS.
  • ๐Ÿ“‚ JSON-Based Rule Engine: Gestures are defined in an external gesture_rules.json file, making it easy to add new signs without changing code.

๐Ÿ› ๏ธ Tech Stack

  • Frontend: Streamlit (Web Interface)
  • Computer Vision: MediaPipe Hands (Google)
  • Image Processing: OpenCV & NumPy
  • Audio: gTTS (Google Text-to-Speech)

๐Ÿ“‚ Project Structure

Hand2Voice/
โ”œโ”€โ”€ app.py                 # Main Streamlit application (UI & Logic)
โ”œโ”€โ”€ gesture_classifier.py  # Advanced logic using Euclidean distance
โ”œโ”€โ”€ gesture_rules.json     # Database of supported gestures
โ”œโ”€โ”€ tts.py                 # Text-to-Speech helper function
โ”œโ”€โ”€ requirements.txt       # List of python dependencies
โ”œโ”€โ”€ NIELIT-LOGO.png        # Institution Logo
โ””โ”€โ”€ README.md              # Project Documentation

๐Ÿ’ฟ Installation & Setup

  1. Clone the Repository
git clone [https://github.com/imarshbir/Hand2Voice.git](https://github.com/imarshbir/Hand2Voice.git)
cd Hand2Voice
  1. Create a Virtual Environment (Optional but Recommended)
python -m venv venv
# Windows
venv\Scripts\activate
# Mac/Linux
source venv/bin/activate
  1. Install Dependencies
pip install -r requirements.txt
  1. Run the Application
streamlit run app.py

๐ŸคŸ Supported Gestures

The current version supports the following gestures (defined in gesture_rules.json):

Gesture Description
HELLO Open palm (All fingers extended)
YES / POINT Index finger raised
NO Closed fist
PEACE Index & Middle fingers raised (V-sign)
OK Thumb & Index touching, others extended
ROCK ON Index & Pinky extended
THUMBS UP Thumb extended only

๐Ÿ”ฌ How It Works (Technical Deep Dive)

Most basic hand gesture tutorials use simple if y_tip < y_knuckle logic. This fails if the hand is tilted sideways.

Hand2Voice solves this by using Vector Math:

  1. Landmark Extraction: MediaPipe extracts 21 3D landmarks () for the hand.
  2. Distance Calculation: We calculate the Euclidean distance between each fingertip and the Wrist (Landmark 0). d=(x2โˆ’x1)2+(y2โˆ’y1)2+(z2โˆ’z1)2 d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2}
  3. State Determination:
  • If , the finger is considered OPEN.
  • Otherwise, it is CLOSED.
  1. Pattern Matching: The resulting binary array (e.g., [0, 1, 1, 0, 0]) is compared against the definitions in gesture_rules.json.

๐Ÿ”ฎ Future Scope

  • Real-time Video Stream: Integration with streamlit-webrtc for continuous streaming without clicking "capture."
  • Dynamic Gestures: Support for moving gestures (like waving) using LSTM networks.
  • Multi-Language Support: Adding Hindi/Punjabi TTS output.

๐Ÿ‘จโ€๐Ÿ’ป Author

Arshbir Singh

  • @imarshbir

  • Expertise: AI, Computer Vision, IoT

  • B.Tech (CSE) Researcher