a-01a commited on
Commit
438ea85
Β·
verified Β·
1 Parent(s): f8de53a

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +215 -0
README.md ADDED
@@ -0,0 +1,215 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Hand Gesture Recognition
3
+ emoji: πŸ–οΈ
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ tags:
12
+ - computer-vision
13
+ - gesture-recognition
14
+ - lstm
15
+ - mediapipe
16
+ - hand-tracking
17
+ - video-classification
18
+ ---
19
+
20
+
21
+ # Hand Gesture Recognition using LSTM
22
+
23
+ A real-time hand gesture recognition system using MediaPipe for hand pose extraction and LSTM neural networks for temporal sequence classification.
24
+
25
+ ## Project Overview
26
+
27
+ This project implements a complete pipeline for recognizing hand gestures from video sequences using:
28
+ - **MediaPipe Hands** for extracting 21 3D hand landmarks
29
+ - **LSTM Neural Networks** for learning temporal patterns in hand movements
30
+ - **Data Augmentation** to improve model generalization
31
+ - **Real-time Recognition** capability via webcam
32
+
33
+ ## Dataset
34
+
35
+ The project uses the **LeapGestRecog** dataset from Kaggle:
36
+ - **Source**: `gti-upm/leapgestrecog`
37
+ - **Structure**: 10 subjects Γ— 10 gestures Γ— multiple video sequences
38
+ - **Format**: 100 frames per gesture sequence (PNG images)
39
+ - The dataset is automatically downloaded to the current directory and cleaned up after training
40
+
41
+ ## Features
42
+
43
+ 1. **Automatic Dataset Management**
44
+ - Downloads dataset to current directory
45
+ - Organizes and preprocesses data
46
+ - Automatic cleanup after training to save space
47
+
48
+ 2. **Hand Pose Extraction**
49
+ - Uses MediaPipe to extract 21 landmarks (63 features: x, y, z coordinates)
50
+ - Processes entire video sequences
51
+ - Visualization support
52
+
53
+ 3. **Data Augmentation**
54
+ - Random noise injection
55
+ - Random occlusion
56
+ - Random scaling and translation
57
+ - Increases dataset size by 3Γ— (original + 2Γ— augmented)
58
+
59
+ 4. **Deep Learning Model**
60
+ - 3-layer LSTM architecture
61
+ - Batch normalization and dropout for regularization
62
+ - Trained on 30-frame sequences
63
+ - Achieves high accuracy on test set
64
+
65
+ 5. **Real-time Recognition**
66
+ - Webcam-based gesture recognition
67
+ - Live predictions with confidence scores
68
+ - Visual feedback with hand landmark overlay
69
+
70
+ ## Installation
71
+
72
+ This project uses **uv** for fast and reliable package management.
73
+
74
+ ### Quick Install
75
+
76
+ ```bash
77
+ # Install uv (Windows PowerShell)
78
+ powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
79
+
80
+ # Navigate to project directory
81
+ cd hand_gesture_recognition
82
+
83
+ # Install all dependencies
84
+ uv pip install -e .
85
+ ```
86
+
87
+ ### Required Packages:
88
+ - `kagglehub` - Dataset download
89
+ - `opencv-python` - Image processing
90
+ - `numpy` - Numerical operations
91
+ - `pandas` - Data manipulation
92
+ - `matplotlib`, `seaborn` - Visualization
93
+ - `mediapipe` - Hand pose estimation
94
+ - `scikit-learn` - ML utilities
95
+ - `tensorflow` - Deep learning framework
96
+ - `tqdm` - Progress bars
97
+
98
+ All dependencies are automatically installed via `uv pip install -e .`
99
+
100
+ ## Real-time Recognition
101
+
102
+ ### Using Inference Script (Recommended)
103
+
104
+ The easiest way to test the model is using the `inference.py` script, which downloads the model from Hugging Face and runs webcam inference:
105
+
106
+ ```bash
107
+ python inference.py --repo a-01a/hand-gesture-recognition
108
+ ```
109
+
110
+ Features:
111
+ - βœ… Automatically downloads model from Hugging Face Hub
112
+ - βœ… Live gesture prediction with confidence scores
113
+ - βœ… Real-time hand landmark detection using MediaPipe
114
+ - βœ… FPS counter and hand detection status
115
+ - βœ… No manual model download required
116
+
117
+ Press 'q' to quit the webcam window.
118
+
119
+ ### Using the Notebook
120
+
121
+ Alternatively, you can run the webcam demo in the notebook after training:
122
+
123
+ ```python
124
+ recognizer = RealTimeGestureRecognizer('hand_gesture_lstm_model.h5', gesture_mapping)
125
+ recognizer.run_webcam_demo()
126
+ ```
127
+
128
+ ## Model Architecture
129
+
130
+ ```
131
+ Input: (30, 63) - 30 frames Γ— 63 features
132
+
133
+ LSTM Layer 1: 128 units (return sequences)
134
+ ↓ BatchNormalization + Dropout(0.3)
135
+
136
+ LSTM Layer 2: 128 units (return sequences)
137
+ ↓ BatchNormalization + Dropout(0.3)
138
+
139
+ LSTM Layer 3: 64 units
140
+ ↓ BatchNormalization + Dropout(0.3)
141
+
142
+ Dense Layer 1: 256 units (ReLU)
143
+ ↓ BatchNormalization + Dropout(0.3)
144
+
145
+ Dense Layer 2: 128 units (ReLU)
146
+ ↓ BatchNormalization + Dropout(0.3)
147
+
148
+ Output Layer: 10 units (Softmax)
149
+ ```
150
+
151
+ ## Performance
152
+
153
+ The model is evaluated using:
154
+ - **Accuracy**: Overall classification accuracy
155
+ - **Confusion Matrix**: Per-class performance visualization
156
+ - **Classification Report**: Precision, recall, F1-score per gesture
157
+ - **Gesture-wise Analysis**: Individual gesture accuracy
158
+
159
+ ## Gestures Recognized
160
+
161
+ The model recognizes 10 different hand gestures from the LeapGestRecog dataset. Each gesture has unique characteristics captured through the temporal sequence of hand landmarks.
162
+
163
+ ## Hyperparameters
164
+
165
+ - **Sequence Length**: 30 frames
166
+ - **LSTM Units**: 128 β†’ 128 β†’ 64
167
+ - **Dropout Rate**: 0.3
168
+ - **Batch Size**: 32
169
+ - **Learning Rate**: 0.001 (with ReduceLROnPlateau)
170
+ - **Epochs**: 100 (with EarlyStopping)
171
+ - **Train/Val/Test Split**: 64%/16%/20%
172
+
173
+ ## Project Structure
174
+
175
+ ```
176
+ hand_gesture_recognition/
177
+ β”œβ”€β”€ hand_gesture_recognition.ipynb # Main training notebook
178
+ β”œβ”€β”€ inference.py # Webcam inference with model download from HF
179
+ β”œβ”€β”€ upload_to_huggingface.py # Upload model to Hugging Face
180
+ β”œβ”€β”€ README.md # This file
181
+ β”œβ”€β”€ TECHNICAL_REPORT.md # Detailed mathematical concepts
182
+ β”œβ”€β”€ LICENSE.md # License
183
+ β”œβ”€β”€ pyproject.toml # Project configuration (uv)
184
+ β”œβ”€β”€ hand_gesture_lstm_model.h5 # Saved model (generated)
185
+ β”œβ”€β”€ gesture_mapping.json # Gesture labels (generated)
186
+ └── datasets/ # Dataset (auto-downloaded & auto-deleted)
187
+ ```
188
+
189
+ ## Cleanup
190
+
191
+ The notebook automatically deletes the downloaded dataset after training to save disk space. The trained model and gesture mappings are saved locally and can be uploaded to Hugging Face for easy sharing and deployment.
192
+
193
+ ## For More Details
194
+
195
+ See [TECHNICAL_REPORT.md](TECHNICAL_REPORT.md) for a comprehensive explanation of all mathematical concepts, algorithms, and methodologies used in this project.
196
+
197
+ ## Citation
198
+
199
+ If you use this model in your research or application, please cite:
200
+
201
+ ```bibtex
202
+ @misc{hand_gesture_lstm_2025,
203
+ title={Hand Gesture Recognition using LSTM and MediaPipe},
204
+ author={Abdul Ahad},
205
+ year={2025},
206
+ howpublished={\url{https://huggingface.co/spaces/a-01a/hand-gesture-recognition}},
207
+ note={Real-time hand gesture recognition system using MediaPipe and LSTM networks}
208
+ }
209
+ ```
210
+
211
+ ## πŸ“„ License
212
+
213
+ MIT License - See [LICENSE.md](LICENSE.md) for details.
214
+
215
+ ---