Upload MODEL_CARD.md

1272345 verified 4 months ago

4.28 kB

title: Hand Gesture Recognition
emoji: 🖐️
colorFrom: blue
colorTo: green
library_name: tensorflow
license: mit
tags:
  - computer-vision
  - gesture-recognition
  - lstm
  - mediapipe
  - hand-tracking
  - video-classification
  - tensorflow
  - keras
  - deep-learning

Model Card: Hand Gesture Recognition LSTM

Model Description

This model performs real-time hand gesture recognition using LSTM neural networks and MediaPipe hand pose estimation.

Model Details

Developed by: Abdul Ahad
Model type: LSTM Sequential Neural Network
Language: TensorFlow/Keras
License: MIT
Model Architecture: 3-layer LSTM with dense output layers

Intended Use

Primary Use Cases

Real-time hand gesture recognition from webcam feeds
Human-computer interaction applications
Sign language recognition systems
Gesture-controlled interfaces

Out-of-Scope Uses

Medical diagnosis
Security/authentication systems (not designed for this purpose)
Applications requiring 100% accuracy in critical scenarios

Training Data

Dataset: LeapGestRecog (gti-upm/leapgestrecog from Kaggle)
Structure: 10 subjects × 10 gestures × multiple video sequences
Format: 100 frames per gesture sequence (PNG images)
Preprocessing: MediaPipe hand landmark extraction (21 landmarks × 3 coordinates = 63 features)
Augmentation: Random noise, occlusion, scaling, and translation (3× data size)

Model Architecture

Input Shape: (30, 63) - 30 frames × 63 features

Layer 1: LSTM(128, return_sequences=True)
         BatchNormalization + Dropout(0.3)

Layer 2: LSTM(128, return_sequences=True)
         BatchNormalization + Dropout(0.3)

Layer 3: LSTM(64)
         BatchNormalization + Dropout(0.3)

Layer 4: Dense(256, activation='relu')
         BatchNormalization + Dropout(0.3)

Layer 5: Dense(128, activation='relu')
         BatchNormalization + Dropout(0.3)

Output:  Dense(10, activation='softmax')

Training Procedure

Hyperparameters

Sequence Length: 30 frames
LSTM Units: 128 → 128 → 64
Dense Units: 256 → 128
Dropout Rate: 0.3
Batch Size: 32
Initial Learning Rate: 0.001
Optimizer: Adam with ReduceLROnPlateau
Loss Function: Categorical Crossentropy
Epochs: Up to 100 (with EarlyStopping)

Data Split

Training: 64%
Validation: 16%
Test: 20%

Performance

The model achieves high accuracy on the LeapGestRecog dataset test set. Performance metrics include:

Overall accuracy
Per-gesture precision, recall, and F1-score
Confusion matrix analysis

See the technical report for detailed performance metrics.

Limitations

Lighting Conditions: Performance may degrade in poor lighting
Hand Visibility: Requires clear view of hand landmarks
Background Complexity: May struggle with cluttered backgrounds
Single Hand: Designed for single-hand gestures
Dataset Bias: Trained on specific gesture types from LeapGestRecog

How to Use

Installation

uv pip install tensorflow mediapipe opencv-python numpy huggingface_hub

Inference

# Download and run inference
uv run python inference.py --repo a-01a/hand-gesture-recognition

Or programmatically:

from huggingface_hub import hf_hub_download
import tensorflow as tf
import json

model_path = hf_hub_download(repo_id="a-01a/hand-gesture-recognition", 
                              filename="hand_gesture_lstm_model.h5")
mapping_path = hf_hub_download(repo_id="a-01a/hand-gesture-recognition", 
                                filename="gesture_mapping.json")

model = tf.keras.models.load_model(model_path)

with open(mapping_path, 'r') as f:
    gesture_mapping = json.load(f)

Citation

@misc{hand_gesture_lstm_2025,
  title={Hand Gesture Recognition using LSTM and MediaPipe},
  author={Abdul Ahad},
  year={2025},
  howpublished={https://huggingface.co/a-01a/hand-gesture-recognition},
  note={Real-time hand gesture recognition system using MediaPipe and LSTM networks}
}