metadata
title: Hand Gesture Recognition
emoji: 🖐️
colorFrom: blue
colorTo: green
library_name: tensorflow
license: mit
tags:
- computer-vision
- gesture-recognition
- lstm
- mediapipe
- hand-tracking
- video-classification
- tensorflow
- keras
- deep-learning
Model Card: Hand Gesture Recognition LSTM
Model Description
This model performs real-time hand gesture recognition using LSTM neural networks and MediaPipe hand pose estimation.
Model Details
- Developed by: Abdul Ahad
- Model type: LSTM Sequential Neural Network
- Language: TensorFlow/Keras
- License: MIT
- Model Architecture: 3-layer LSTM with dense output layers
Intended Use
Primary Use Cases
- Real-time hand gesture recognition from webcam feeds
- Human-computer interaction applications
- Sign language recognition systems
- Gesture-controlled interfaces
Out-of-Scope Uses
- Medical diagnosis
- Security/authentication systems (not designed for this purpose)
- Applications requiring 100% accuracy in critical scenarios
Training Data
- Dataset: LeapGestRecog (gti-upm/leapgestrecog from Kaggle)
- Structure: 10 subjects × 10 gestures × multiple video sequences
- Format: 100 frames per gesture sequence (PNG images)
- Preprocessing: MediaPipe hand landmark extraction (21 landmarks × 3 coordinates = 63 features)
- Augmentation: Random noise, occlusion, scaling, and translation (3× data size)
Model Architecture
Input Shape: (30, 63) - 30 frames × 63 features
Layer 1: LSTM(128, return_sequences=True)
BatchNormalization + Dropout(0.3)
Layer 2: LSTM(128, return_sequences=True)
BatchNormalization + Dropout(0.3)
Layer 3: LSTM(64)
BatchNormalization + Dropout(0.3)
Layer 4: Dense(256, activation='relu')
BatchNormalization + Dropout(0.3)
Layer 5: Dense(128, activation='relu')
BatchNormalization + Dropout(0.3)
Output: Dense(10, activation='softmax')
Training Procedure
Hyperparameters
- Sequence Length: 30 frames
- LSTM Units: 128 → 128 → 64
- Dense Units: 256 → 128
- Dropout Rate: 0.3
- Batch Size: 32
- Initial Learning Rate: 0.001
- Optimizer: Adam with ReduceLROnPlateau
- Loss Function: Categorical Crossentropy
- Epochs: Up to 100 (with EarlyStopping)
Data Split
- Training: 64%
- Validation: 16%
- Test: 20%
Performance
The model achieves high accuracy on the LeapGestRecog dataset test set. Performance metrics include:
- Overall accuracy
- Per-gesture precision, recall, and F1-score
- Confusion matrix analysis
See the technical report for detailed performance metrics.
Limitations
- Lighting Conditions: Performance may degrade in poor lighting
- Hand Visibility: Requires clear view of hand landmarks
- Background Complexity: May struggle with cluttered backgrounds
- Single Hand: Designed for single-hand gestures
- Dataset Bias: Trained on specific gesture types from LeapGestRecog
How to Use
Installation
uv pip install tensorflow mediapipe opencv-python numpy huggingface_hub
Inference
# Download and run inference
uv run python inference.py --repo a-01a/hand-gesture-recognition
Or programmatically:
from huggingface_hub import hf_hub_download
import tensorflow as tf
import json
model_path = hf_hub_download(repo_id="a-01a/hand-gesture-recognition",
filename="hand_gesture_lstm_model.h5")
mapping_path = hf_hub_download(repo_id="a-01a/hand-gesture-recognition",
filename="gesture_mapping.json")
model = tf.keras.models.load_model(model_path)
with open(mapping_path, 'r') as f:
gesture_mapping = json.load(f)
Citation
@misc{hand_gesture_lstm_2025,
title={Hand Gesture Recognition using LSTM and MediaPipe},
author={Abdul Ahad},
year={2025},
howpublished={https://huggingface.co/a-01a/hand-gesture-recognition},
note={Real-time hand gesture recognition system using MediaPipe and LSTM networks}
}