shayan5422's picture
Update README.md
243a720 verified
---
license: mit
pipeline_tag: object-detection
---
# Eye and Eyebrow Movement Recognition Model
![License](https://img.shields.io/badge/license-MIT-blue.svg)
![Python](https://img.shields.io/badge/python-3.9%2B-blue.svg)
![TensorFlow](https://img.shields.io/badge/tensorflow-2.8.0%2B-brightgreen.svg)
## πŸ“– Table of Contents
- [πŸ“š Description](#-description)
- [πŸ” Features](#-features)
- [🎯 Intended Use](#-intended-use)
- [🧠 Model Architecture](#-model-architecture)
- [πŸ“‹ Training Data](#-training-data)
- [πŸ“ˆ Evaluation](#-evaluation)
- [πŸ’» Usage](#-usage)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Loading the Model](#loading-the-model)
- [Making Predictions](#making-predictions)
- [πŸ”§ Limitations](#-limitations)
- [βš–οΈ Ethical Considerations](#-ethical-considerations)
- [πŸ“œ License](#-license)
- [πŸ™ Acknowledgements](#-acknowledgements)
## πŸ“š Description
The **Eye and Eyebrow Movement Recognition** model is an advanced real-time system designed to accurately detect and classify subtle facial movements, specifically focusing on the eyes and eyebrows. Currently, the model is trained to recognize three distinct movements:
- **Yes:** Characterized by the raising of eyebrows.
- **No:** Indicated by the lowering of eyebrows.
- **Normal:** Representing a neutral facial expression without significant eye or eyebrow movements.
Leveraging a **CNN-LSTM** (Convolutional Neural Network - Long Short-Term Memory) architecture, the model effectively captures both spatial features from individual frames and temporal dynamics across sequences of frames. This ensures robust and reliable performance in real-world scenarios.
## πŸ” Features
- **Real-Time Detection:** Continuously processes live webcam feeds to detect eye and eyebrow movements without noticeable lag.
- **GPU Acceleration:** Optimized for GPU usage via TensorFlow-Metal on macOS, ensuring efficient computations.
- **Extensible Design:** While currently supporting "Yes," "No," and "Normal" movements, the system is designed to be easily extended to accommodate additional facial gestures or movements.
- **User-Friendly Interface:** Provides visual feedback by overlaying predictions directly onto the live video feed for immediate user feedback.
- **High Accuracy:** Demonstrates high accuracy in distinguishing between the supported movements, making it a reliable tool for real-time facial gesture recognition.
## 🎯 Intended Use
This model is ideal for a variety of applications, including but not limited to:
- **Human-Computer Interaction (HCI):** Enhancing user interfaces with gesture-based controls.
- **Assistive Technologies:** Providing non-verbal communication tools for individuals with speech impairments.
- **Behavioral Analysis:** Monitoring and analyzing facial expressions for psychological or market research.
- **Gaming:** Creating more immersive and responsive gaming experiences through facial gesture controls.
**Note:** The model is intended for research and educational purposes. Ensure compliance with privacy and ethical guidelines when deploying in real-world applications.
## 🧠 Model Architecture
The model employs a **CNN-LSTM** architecture to capture both spatial and temporal features:
1. **TimeDistributed CNN Layers:**
- **Conv2D:** Extracts spatial features from each frame independently.
- **MaxPooling2D:** Reduces spatial dimensions.
- **BatchNormalization:** Stabilizes and accelerates training.
2. **Flatten Layer:**
- Flattens the output from CNN layers to prepare for LSTM processing.
3. **LSTM Layer:**
- Captures temporal dependencies across the sequence of frames.
4. **Dense Layers:**
- Fully connected layers that perform the final classification based on combined spatial-temporal features.
5. **Output Layer:**
- **Softmax Activation:** Provides probability distribution over the three classes ("Yes," "No," "Normal").
## πŸ“‹ Training Data
The model was trained on a curated dataset consisting of short video clips (1-2 seconds) capturing the three target movements:
- **Yes:** 50 samples
- **No:** 50 samples
- **Normal:** 50 samples
Each video was recorded using a standard webcam under varied lighting conditions and backgrounds to ensure robustness. The videos were manually labeled and organized into respective directories for preprocessing.
## πŸ“ˆ Evaluation
The model was evaluated on a separate test set comprising 60 samples for each class. The evaluation metrics are as follows:
- **Accuracy:** 85%
- **Precision:** 84%
- **Recall:** 86%
- **F1-Score:** 85%
## πŸ’» Usage
### Prerequisites
- **Hardware:** Mac with Apple Silicon (M1, M1 Pro, M1 Max, M2, etc.) for Metal GPU support.
- **Operating System:** macOS 12.3 (Monterey) or newer.
- **Python:** Version 3.9 or higher.
### Installation
1. **Clone the Repository**
```bash
git clone https://huggingface.co/shayan5422/eye-eyebrow-movement-recognition
cd eye-eyebrow-movement-recognition
```
2. **Install Homebrew (if not already installed)**
Homebrew is a package manager for macOS that simplifies the installation of software.
```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```
3. **Install Micromamba**
Micromamba is a lightweight package manager compatible with Conda environments.
```bash
brew install micromamba
```
4. **Create and Activate a Virtual Environment**
We'll use Micromamba to create an isolated environment for our project.
```bash
# Create a new environment named 'eye_movement' with Python 3.9
micromamba create -n eye_movement python=3.9
# Activate the environment
micromamba activate eye_movement
```
5. **Install Required Libraries**
We'll install TensorFlow with Metal support (`tensorflow-macos` and `tensorflow-metal`) along with other necessary libraries.
```bash
# Install TensorFlow for macOS
pip install tensorflow-macos
# Install TensorFlow Metal plugin for GPU acceleration
pip install tensorflow-metal
# Install other dependencies
pip install opencv-python dlib imutils tqdm scikit-learn matplotlib seaborn h5py
```
> **Note:** Installing `dlib` can sometimes be challenging on macOS. If you encounter issues, consider installing it via Conda or refer to [dlib's official installation instructions](http://dlib.net/compile.html).
6. **Download Dlib's Pre-trained Shape Predictor**
This model is essential for facial landmark detection.
```bash
# Navigate to your project directory
cd /path/to/your/project/eye-eyebrow-movement-recognition/
# Download the shape predictor
curl -LO http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
# Decompress the file
bunzip2 shape_predictor_68_face_landmarks.dat.bz2
```
Ensure that the `shape_predictor_68_face_landmarks.dat` file is in the same directory as your scripts.
### Loading the Model
```python
import tensorflow as tf
# Load the trained model
model = tf.keras.models.load_model('final_model_sequences.keras')
```
### Making Predictions
```python
import cv2
import numpy as np
import dlib
from imutils import face_utils
from collections import deque
import queue
import threading
# Initialize dlib's face detector and landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')
# Initialize queues for threading
input_queue = queue.Queue()
output_queue = queue.Queue()
# Define sequence length
max_seq_length = 30
def prediction_worker(model, input_q, output_q):
while True:
sequence = input_q.get()
if sequence is None:
break
# Preprocess and predict
# [Add your prediction logic here]
# Example:
prediction = model.predict(sequence)
class_idx = np.argmax(prediction)
confidence = np.max(prediction)
output_q.put((class_idx, confidence))
# Start prediction thread
thread = threading.Thread(target=prediction_worker, args=(model, input_queue, output_queue))
thread.start()
# Start video capture
cap = cv2.VideoCapture(0)
frame_buffer = deque(maxlen=max_seq_length)
while True:
ret, frame = cap.read()
if not ret:
break
# Preprocess frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
rects = detector(gray, 1)
if len(rects) > 0:
rect = rects[0]
shape = predictor(gray, rect)
shape = face_utils.shape_to_np(shape)
# Extract ROIs and preprocess
# [Add your ROI extraction and preprocessing here]
# Example:
preprocessed_frame = preprocess_frame(frame, detector, predictor)
frame_buffer.append(preprocessed_frame)
else:
frame_buffer.append(np.zeros((64, 256, 1), dtype='float32'))
# If buffer is full, send to prediction
if len(frame_buffer) == max_seq_length:
sequence = np.array(frame_buffer)
input_queue.put(np.expand_dims(sequence, axis=0))
frame_buffer.clear()
# Check for prediction results
try:
while True:
class_idx, confidence = output_queue.get_nowait()
movement = index_to_text.get(class_idx, "Unknown")
text = f"{movement} ({confidence*100:.2f}%)"
cv2.putText(frame, text, (30, 30), cv2.FONT_HERSHEY_SIMPLEX,
0.8, (0, 255, 0), 2, cv2.LINE_AA)
except queue.Empty:
pass
# Display the frame
cv2.imshow('Real-time Movement Prediction', frame)
# Exit on 'q' key
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Cleanup
cap.release()
cv2.destroyAllWindows()
input_queue.put(None)
thread.join()
```
**Note:** Replace the placeholder comments with your actual preprocessing and prediction logic as implemented in your scripts.
## πŸ”§ Limitations
- **Movement Scope:** Currently, the model is limited to recognizing "Yes," "No," and "Normal" movements. Extending to additional movements would require further data collection and training.
- **Environmental Constraints:** The model performs best under good lighting conditions and with a clear, frontal view of the face. Variations in lighting, occlusions, or extreme angles may affect accuracy.
- **Single Face Assumption:** The system is designed to handle a single face in the frame. Multiple faces may lead to unpredictable behavior.
## βš–οΈ Ethical Considerations
- **Privacy:** Ensure that users are aware of and consent to the use of their facial data. Handle all captured data responsibly and in compliance with relevant privacy laws and regulations.
- **Bias:** The model's performance may vary across different demographics. It's essential to train the model on a diverse dataset to minimize biases related to age, gender, ethnicity, and other factors.
- **Misuse:** Like all facial recognition technologies, there's potential for misuse. Implement safeguards to prevent unauthorized or unethical applications of the model.
## πŸ“œ License
This project is licensed under the [MIT License](LICENSE).
## πŸ™ Acknowledgements
- [TensorFlow](https://www.tensorflow.org/)
- [OpenCV](https://opencv.org/)
- [dlib](http://dlib.net/)
- [imutils](https://github.com/jrosebr1/imutils)
- [Hugging Face](https://huggingface.co/)
- [Metal Performance Shaders (MPS)](https://developer.apple.com/documentation/metalperformanceshaders)
- [Micromamba](https://mamba.readthedocs.io/en/latest/micromamba.html)
---
**Feel free to reach out or contribute to enhance the capabilities of this model!**
```