hand-gesture-recognition / MODEL_CARD.md
a-01a's picture
Upload MODEL_CARD.md
1272345 verified
metadata
title: Hand Gesture Recognition
emoji: 🖐️
colorFrom: blue
colorTo: green
library_name: tensorflow
license: mit
tags:
  - computer-vision
  - gesture-recognition
  - lstm
  - mediapipe
  - hand-tracking
  - video-classification
  - tensorflow
  - keras
  - deep-learning

Model Card: Hand Gesture Recognition LSTM

Model Description

This model performs real-time hand gesture recognition using LSTM neural networks and MediaPipe hand pose estimation.

Model Details

  • Developed by: Abdul Ahad
  • Model type: LSTM Sequential Neural Network
  • Language: TensorFlow/Keras
  • License: MIT
  • Model Architecture: 3-layer LSTM with dense output layers

Intended Use

Primary Use Cases

  • Real-time hand gesture recognition from webcam feeds
  • Human-computer interaction applications
  • Sign language recognition systems
  • Gesture-controlled interfaces

Out-of-Scope Uses

  • Medical diagnosis
  • Security/authentication systems (not designed for this purpose)
  • Applications requiring 100% accuracy in critical scenarios

Training Data

  • Dataset: LeapGestRecog (gti-upm/leapgestrecog from Kaggle)
  • Structure: 10 subjects × 10 gestures × multiple video sequences
  • Format: 100 frames per gesture sequence (PNG images)
  • Preprocessing: MediaPipe hand landmark extraction (21 landmarks × 3 coordinates = 63 features)
  • Augmentation: Random noise, occlusion, scaling, and translation (3× data size)

Model Architecture

Input Shape: (30, 63) - 30 frames × 63 features

Layer 1: LSTM(128, return_sequences=True)
         BatchNormalization + Dropout(0.3)

Layer 2: LSTM(128, return_sequences=True)
         BatchNormalization + Dropout(0.3)

Layer 3: LSTM(64)
         BatchNormalization + Dropout(0.3)

Layer 4: Dense(256, activation='relu')
         BatchNormalization + Dropout(0.3)

Layer 5: Dense(128, activation='relu')
         BatchNormalization + Dropout(0.3)

Output:  Dense(10, activation='softmax')

Training Procedure

Hyperparameters

  • Sequence Length: 30 frames
  • LSTM Units: 128 → 128 → 64
  • Dense Units: 256 → 128
  • Dropout Rate: 0.3
  • Batch Size: 32
  • Initial Learning Rate: 0.001
  • Optimizer: Adam with ReduceLROnPlateau
  • Loss Function: Categorical Crossentropy
  • Epochs: Up to 100 (with EarlyStopping)

Data Split

  • Training: 64%
  • Validation: 16%
  • Test: 20%

Performance

The model achieves high accuracy on the LeapGestRecog dataset test set. Performance metrics include:

  • Overall accuracy
  • Per-gesture precision, recall, and F1-score
  • Confusion matrix analysis

See the technical report for detailed performance metrics.

Limitations

  1. Lighting Conditions: Performance may degrade in poor lighting
  2. Hand Visibility: Requires clear view of hand landmarks
  3. Background Complexity: May struggle with cluttered backgrounds
  4. Single Hand: Designed for single-hand gestures
  5. Dataset Bias: Trained on specific gesture types from LeapGestRecog

How to Use

Installation

uv pip install tensorflow mediapipe opencv-python numpy huggingface_hub

Inference

# Download and run inference
uv run python inference.py --repo a-01a/hand-gesture-recognition

Or programmatically:

from huggingface_hub import hf_hub_download
import tensorflow as tf
import json

model_path = hf_hub_download(repo_id="a-01a/hand-gesture-recognition", 
                              filename="hand_gesture_lstm_model.h5")
mapping_path = hf_hub_download(repo_id="a-01a/hand-gesture-recognition", 
                                filename="gesture_mapping.json")

model = tf.keras.models.load_model(model_path)

with open(mapping_path, 'r') as f:
    gesture_mapping = json.load(f)

Citation

@misc{hand_gesture_lstm_2025,
  title={Hand Gesture Recognition using LSTM and MediaPipe},
  author={Abdul Ahad},
  year={2025},
  howpublished={https://huggingface.co/a-01a/hand-gesture-recognition},
  note={Real-time hand gesture recognition system using MediaPipe and LSTM networks}
}