File size: 6,727 Bytes
438ea85
 
 
 
 
54cad43
438ea85
 
 
 
 
 
 
 
54cad43
 
 
438ea85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
---

title: Hand Gesture Recognition
emoji: πŸ–οΈ
colorFrom: blue
colorTo: green
library_name: tensorflow
license: mit
tags:
  - computer-vision
  - gesture-recognition
  - lstm
  - mediapipe
  - hand-tracking
  - video-classification
  - tensorflow
  - keras
  - deep-learning
---



# Hand Gesture Recognition using LSTM

A real-time hand gesture recognition system using MediaPipe for hand pose extraction and LSTM neural networks for temporal sequence classification.

## Project Overview

This project implements a complete pipeline for recognizing hand gestures from video sequences using:
- **MediaPipe Hands** for extracting 21 3D hand landmarks
- **LSTM Neural Networks** for learning temporal patterns in hand movements
- **Data Augmentation** to improve model generalization
- **Real-time Recognition** capability via webcam

## Dataset

The project uses the **LeapGestRecog** dataset from Kaggle:
- **Source**: `gti-upm/leapgestrecog`
- **Structure**: 10 subjects Γ— 10 gestures Γ— multiple video sequences
- **Format**: 100 frames per gesture sequence (PNG images)
- The dataset is automatically downloaded to the current directory and cleaned up after training

## Features

1. **Automatic Dataset Management**
   - Downloads dataset to current directory
   - Organizes and preprocesses data
   - Automatic cleanup after training to save space

2. **Hand Pose Extraction**
   - Uses MediaPipe to extract 21 landmarks (63 features: x, y, z coordinates)
   - Processes entire video sequences
   - Visualization support

3. **Data Augmentation**
   - Random noise injection
   - Random occlusion
   - Random scaling and translation
   - Increases dataset size by 3Γ— (original + 2Γ— augmented)

4. **Deep Learning Model**
   - 3-layer LSTM architecture
   - Batch normalization and dropout for regularization
   - Trained on 30-frame sequences
   - Achieves high accuracy on test set

5. **Real-time Recognition**
   - Webcam-based gesture recognition
   - Live predictions with confidence scores
   - Visual feedback with hand landmark overlay

## Installation

This project uses **uv** for fast and reliable package management.

### Quick Install

```bash

# Install uv (Windows PowerShell)

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"



# Navigate to project directory

cd hand_gesture_recognition



# Install all dependencies

uv pip install -e .

```

### Required Packages:
- `kagglehub` - Dataset download
- `opencv-python` - Image processing
- `numpy` - Numerical operations
- `pandas` - Data manipulation
- `matplotlib`, `seaborn` - Visualization
- `mediapipe` - Hand pose estimation
- `scikit-learn` - ML utilities
- `tensorflow` - Deep learning framework
- `tqdm` - Progress bars

All dependencies are automatically installed via `uv pip install -e .`

## Real-time Recognition

### Using Inference Script (Recommended)

The easiest way to test the model is using the `inference.py` script, which downloads the model from Hugging Face and runs webcam inference:

```bash

python inference.py --repo a-01a/hand-gesture-recognition

```

Features:
- βœ… Automatically downloads model from Hugging Face Hub
- βœ… Live gesture prediction with confidence scores
- βœ… Real-time hand landmark detection using MediaPipe
- βœ… FPS counter and hand detection status
- βœ… No manual model download required

Press 'q' to quit the webcam window.

### Using the Notebook

Alternatively, you can run the webcam demo in the notebook after training:

```python

recognizer = RealTimeGestureRecognizer('hand_gesture_lstm_model.h5', gesture_mapping)

recognizer.run_webcam_demo()

```

## Model Architecture

```

Input: (30, 63) - 30 frames Γ— 63 features



LSTM Layer 1: 128 units (return sequences)

    ↓ BatchNormalization + Dropout(0.3)



LSTM Layer 2: 128 units (return sequences)

    ↓ BatchNormalization + Dropout(0.3)



LSTM Layer 3: 64 units

    ↓ BatchNormalization + Dropout(0.3)



Dense Layer 1: 256 units (ReLU)

    ↓ BatchNormalization + Dropout(0.3)



Dense Layer 2: 128 units (ReLU)

    ↓ BatchNormalization + Dropout(0.3)



Output Layer: 10 units (Softmax)

```

## Performance

The model is evaluated using:
- **Accuracy**: Overall classification accuracy
- **Confusion Matrix**: Per-class performance visualization
- **Classification Report**: Precision, recall, F1-score per gesture
- **Gesture-wise Analysis**: Individual gesture accuracy

## Gestures Recognized

The model recognizes 10 different hand gestures from the LeapGestRecog dataset. Each gesture has unique characteristics captured through the temporal sequence of hand landmarks.

## Hyperparameters

- **Sequence Length**: 30 frames
- **LSTM Units**: 128 β†’ 128 β†’ 64
- **Dropout Rate**: 0.3
- **Batch Size**: 32
- **Learning Rate**: 0.001 (with ReduceLROnPlateau)
- **Epochs**: 100 (with EarlyStopping)
- **Train/Val/Test Split**: 64%/16%/20%

## Project Structure

```

hand_gesture_recognition/

β”œβ”€β”€ hand_gesture_recognition.ipynb  # Main training notebook

β”œβ”€β”€ inference.py                    # Webcam inference with model download from HF

β”œβ”€β”€ upload_to_huggingface.py       # Upload model to Hugging Face

β”œβ”€β”€ README.md                       # This file

β”œβ”€β”€ TECHNICAL_REPORT.md            # Detailed mathematical concepts

β”œβ”€β”€ LICENSE.md                      # License

β”œβ”€β”€ pyproject.toml                 # Project configuration (uv)

β”œβ”€β”€ hand_gesture_lstm_model.h5     # Saved model (generated)

β”œβ”€β”€ gesture_mapping.json           # Gesture labels (generated)

└── datasets/                       # Dataset (auto-downloaded & auto-deleted)

```

## Cleanup

The notebook automatically deletes the downloaded dataset after training to save disk space. The trained model and gesture mappings are saved locally and can be uploaded to Hugging Face for easy sharing and deployment.

## For More Details

See [TECHNICAL_REPORT.md](TECHNICAL_REPORT.md) for a comprehensive explanation of all mathematical concepts, algorithms, and methodologies used in this project.

## Citation

If you use this model in your research or application, please cite:

```bibtex

@misc{hand_gesture_lstm_2025,

  title={Hand Gesture Recognition using LSTM and MediaPipe},

  author={Abdul Ahad},

  year={2025},

  howpublished={\url{https://huggingface.co/spaces/a-01a/hand-gesture-recognition}},

  note={Real-time hand gesture recognition system using MediaPipe and LSTM networks}

}

```

## πŸ“„ License

MIT License - See [LICENSE.md](LICENSE.md) for details.

---