|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: object-detection |
|
|
--- |
|
|
|
|
|
# Eye and Eyebrow Movement Recognition Model |
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
## π Table of Contents |
|
|
|
|
|
- [π Description](#-description) |
|
|
- [π Features](#-features) |
|
|
- [π― Intended Use](#-intended-use) |
|
|
- [π§ Model Architecture](#-model-architecture) |
|
|
- [π Training Data](#-training-data) |
|
|
- [π Evaluation](#-evaluation) |
|
|
- [π» Usage](#-usage) |
|
|
- [Prerequisites](#prerequisites) |
|
|
- [Installation](#installation) |
|
|
- [Loading the Model](#loading-the-model) |
|
|
- [Making Predictions](#making-predictions) |
|
|
- [π§ Limitations](#-limitations) |
|
|
- [βοΈ Ethical Considerations](#-ethical-considerations) |
|
|
- [π License](#-license) |
|
|
- [π Acknowledgements](#-acknowledgements) |
|
|
|
|
|
## π Description |
|
|
|
|
|
The **Eye and Eyebrow Movement Recognition** model is an advanced real-time system designed to accurately detect and classify subtle facial movements, specifically focusing on the eyes and eyebrows. Currently, the model is trained to recognize three distinct movements: |
|
|
|
|
|
- **Yes:** Characterized by the raising of eyebrows. |
|
|
- **No:** Indicated by the lowering of eyebrows. |
|
|
- **Normal:** Representing a neutral facial expression without significant eye or eyebrow movements. |
|
|
|
|
|
Leveraging a **CNN-LSTM** (Convolutional Neural Network - Long Short-Term Memory) architecture, the model effectively captures both spatial features from individual frames and temporal dynamics across sequences of frames. This ensures robust and reliable performance in real-world scenarios. |
|
|
|
|
|
## π Features |
|
|
|
|
|
- **Real-Time Detection:** Continuously processes live webcam feeds to detect eye and eyebrow movements without noticeable lag. |
|
|
- **GPU Acceleration:** Optimized for GPU usage via TensorFlow-Metal on macOS, ensuring efficient computations. |
|
|
- **Extensible Design:** While currently supporting "Yes," "No," and "Normal" movements, the system is designed to be easily extended to accommodate additional facial gestures or movements. |
|
|
- **User-Friendly Interface:** Provides visual feedback by overlaying predictions directly onto the live video feed for immediate user feedback. |
|
|
- **High Accuracy:** Demonstrates high accuracy in distinguishing between the supported movements, making it a reliable tool for real-time facial gesture recognition. |
|
|
|
|
|
## π― Intended Use |
|
|
|
|
|
This model is ideal for a variety of applications, including but not limited to: |
|
|
|
|
|
- **Human-Computer Interaction (HCI):** Enhancing user interfaces with gesture-based controls. |
|
|
- **Assistive Technologies:** Providing non-verbal communication tools for individuals with speech impairments. |
|
|
- **Behavioral Analysis:** Monitoring and analyzing facial expressions for psychological or market research. |
|
|
- **Gaming:** Creating more immersive and responsive gaming experiences through facial gesture controls. |
|
|
|
|
|
**Note:** The model is intended for research and educational purposes. Ensure compliance with privacy and ethical guidelines when deploying in real-world applications. |
|
|
|
|
|
## π§ Model Architecture |
|
|
|
|
|
The model employs a **CNN-LSTM** architecture to capture both spatial and temporal features: |
|
|
|
|
|
1. **TimeDistributed CNN Layers:** |
|
|
- **Conv2D:** Extracts spatial features from each frame independently. |
|
|
- **MaxPooling2D:** Reduces spatial dimensions. |
|
|
- **BatchNormalization:** Stabilizes and accelerates training. |
|
|
|
|
|
2. **Flatten Layer:** |
|
|
- Flattens the output from CNN layers to prepare for LSTM processing. |
|
|
|
|
|
3. **LSTM Layer:** |
|
|
- Captures temporal dependencies across the sequence of frames. |
|
|
|
|
|
4. **Dense Layers:** |
|
|
- Fully connected layers that perform the final classification based on combined spatial-temporal features. |
|
|
|
|
|
5. **Output Layer:** |
|
|
- **Softmax Activation:** Provides probability distribution over the three classes ("Yes," "No," "Normal"). |
|
|
|
|
|
## π Training Data |
|
|
|
|
|
The model was trained on a curated dataset consisting of short video clips (1-2 seconds) capturing the three target movements: |
|
|
|
|
|
- **Yes:** 50 samples |
|
|
- **No:** 50 samples |
|
|
- **Normal:** 50 samples |
|
|
|
|
|
Each video was recorded using a standard webcam under varied lighting conditions and backgrounds to ensure robustness. The videos were manually labeled and organized into respective directories for preprocessing. |
|
|
|
|
|
## π Evaluation |
|
|
|
|
|
The model was evaluated on a separate test set comprising 60 samples for each class. The evaluation metrics are as follows: |
|
|
|
|
|
- **Accuracy:** 85% |
|
|
- **Precision:** 84% |
|
|
- **Recall:** 86% |
|
|
- **F1-Score:** 85% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## π» Usage |
|
|
|
|
|
### Prerequisites |
|
|
|
|
|
- **Hardware:** Mac with Apple Silicon (M1, M1 Pro, M1 Max, M2, etc.) for Metal GPU support. |
|
|
- **Operating System:** macOS 12.3 (Monterey) or newer. |
|
|
- **Python:** Version 3.9 or higher. |
|
|
|
|
|
### Installation |
|
|
|
|
|
1. **Clone the Repository** |
|
|
|
|
|
```bash |
|
|
git clone https://huggingface.co/shayan5422/eye-eyebrow-movement-recognition |
|
|
cd eye-eyebrow-movement-recognition |
|
|
``` |
|
|
|
|
|
2. **Install Homebrew (if not already installed)** |
|
|
|
|
|
Homebrew is a package manager for macOS that simplifies the installation of software. |
|
|
|
|
|
```bash |
|
|
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" |
|
|
``` |
|
|
|
|
|
3. **Install Micromamba** |
|
|
|
|
|
Micromamba is a lightweight package manager compatible with Conda environments. |
|
|
|
|
|
```bash |
|
|
brew install micromamba |
|
|
``` |
|
|
|
|
|
4. **Create and Activate a Virtual Environment** |
|
|
|
|
|
We'll use Micromamba to create an isolated environment for our project. |
|
|
|
|
|
```bash |
|
|
# Create a new environment named 'eye_movement' with Python 3.9 |
|
|
micromamba create -n eye_movement python=3.9 |
|
|
|
|
|
# Activate the environment |
|
|
micromamba activate eye_movement |
|
|
``` |
|
|
|
|
|
5. **Install Required Libraries** |
|
|
|
|
|
We'll install TensorFlow with Metal support (`tensorflow-macos` and `tensorflow-metal`) along with other necessary libraries. |
|
|
|
|
|
```bash |
|
|
# Install TensorFlow for macOS |
|
|
pip install tensorflow-macos |
|
|
|
|
|
# Install TensorFlow Metal plugin for GPU acceleration |
|
|
pip install tensorflow-metal |
|
|
|
|
|
# Install other dependencies |
|
|
pip install opencv-python dlib imutils tqdm scikit-learn matplotlib seaborn h5py |
|
|
``` |
|
|
|
|
|
> **Note:** Installing `dlib` can sometimes be challenging on macOS. If you encounter issues, consider installing it via Conda or refer to [dlib's official installation instructions](http://dlib.net/compile.html). |
|
|
|
|
|
6. **Download Dlib's Pre-trained Shape Predictor** |
|
|
|
|
|
This model is essential for facial landmark detection. |
|
|
|
|
|
```bash |
|
|
# Navigate to your project directory |
|
|
cd /path/to/your/project/eye-eyebrow-movement-recognition/ |
|
|
|
|
|
# Download the shape predictor |
|
|
curl -LO http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2 |
|
|
|
|
|
# Decompress the file |
|
|
bunzip2 shape_predictor_68_face_landmarks.dat.bz2 |
|
|
``` |
|
|
|
|
|
Ensure that the `shape_predictor_68_face_landmarks.dat` file is in the same directory as your scripts. |
|
|
|
|
|
### Loading the Model |
|
|
|
|
|
```python |
|
|
import tensorflow as tf |
|
|
|
|
|
# Load the trained model |
|
|
model = tf.keras.models.load_model('final_model_sequences.keras') |
|
|
``` |
|
|
|
|
|
### Making Predictions |
|
|
|
|
|
```python |
|
|
import cv2 |
|
|
import numpy as np |
|
|
import dlib |
|
|
from imutils import face_utils |
|
|
from collections import deque |
|
|
import queue |
|
|
import threading |
|
|
|
|
|
# Initialize dlib's face detector and landmark predictor |
|
|
detector = dlib.get_frontal_face_detector() |
|
|
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat') |
|
|
|
|
|
# Initialize queues for threading |
|
|
input_queue = queue.Queue() |
|
|
output_queue = queue.Queue() |
|
|
|
|
|
# Define sequence length |
|
|
max_seq_length = 30 |
|
|
|
|
|
def prediction_worker(model, input_q, output_q): |
|
|
while True: |
|
|
sequence = input_q.get() |
|
|
if sequence is None: |
|
|
break |
|
|
# Preprocess and predict |
|
|
# [Add your prediction logic here] |
|
|
# Example: |
|
|
prediction = model.predict(sequence) |
|
|
class_idx = np.argmax(prediction) |
|
|
confidence = np.max(prediction) |
|
|
output_q.put((class_idx, confidence)) |
|
|
|
|
|
# Start prediction thread |
|
|
thread = threading.Thread(target=prediction_worker, args=(model, input_queue, output_queue)) |
|
|
thread.start() |
|
|
|
|
|
# Start video capture |
|
|
cap = cv2.VideoCapture(0) |
|
|
frame_buffer = deque(maxlen=max_seq_length) |
|
|
|
|
|
while True: |
|
|
ret, frame = cap.read() |
|
|
if not ret: |
|
|
break |
|
|
|
|
|
# Preprocess frame |
|
|
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) |
|
|
rects = detector(gray, 1) |
|
|
if len(rects) > 0: |
|
|
rect = rects[0] |
|
|
shape = predictor(gray, rect) |
|
|
shape = face_utils.shape_to_np(shape) |
|
|
# Extract ROIs and preprocess |
|
|
# [Add your ROI extraction and preprocessing here] |
|
|
# Example: |
|
|
preprocessed_frame = preprocess_frame(frame, detector, predictor) |
|
|
frame_buffer.append(preprocessed_frame) |
|
|
else: |
|
|
frame_buffer.append(np.zeros((64, 256, 1), dtype='float32')) |
|
|
|
|
|
# If buffer is full, send to prediction |
|
|
if len(frame_buffer) == max_seq_length: |
|
|
sequence = np.array(frame_buffer) |
|
|
input_queue.put(np.expand_dims(sequence, axis=0)) |
|
|
frame_buffer.clear() |
|
|
|
|
|
# Check for prediction results |
|
|
try: |
|
|
while True: |
|
|
class_idx, confidence = output_queue.get_nowait() |
|
|
movement = index_to_text.get(class_idx, "Unknown") |
|
|
text = f"{movement} ({confidence*100:.2f}%)" |
|
|
cv2.putText(frame, text, (30, 30), cv2.FONT_HERSHEY_SIMPLEX, |
|
|
0.8, (0, 255, 0), 2, cv2.LINE_AA) |
|
|
except queue.Empty: |
|
|
pass |
|
|
|
|
|
# Display the frame |
|
|
cv2.imshow('Real-time Movement Prediction', frame) |
|
|
|
|
|
# Exit on 'q' key |
|
|
if cv2.waitKey(1) & 0xFF == ord('q'): |
|
|
break |
|
|
|
|
|
# Cleanup |
|
|
cap.release() |
|
|
cv2.destroyAllWindows() |
|
|
input_queue.put(None) |
|
|
thread.join() |
|
|
``` |
|
|
|
|
|
**Note:** Replace the placeholder comments with your actual preprocessing and prediction logic as implemented in your scripts. |
|
|
|
|
|
## π§ Limitations |
|
|
|
|
|
- **Movement Scope:** Currently, the model is limited to recognizing "Yes," "No," and "Normal" movements. Extending to additional movements would require further data collection and training. |
|
|
- **Environmental Constraints:** The model performs best under good lighting conditions and with a clear, frontal view of the face. Variations in lighting, occlusions, or extreme angles may affect accuracy. |
|
|
- **Single Face Assumption:** The system is designed to handle a single face in the frame. Multiple faces may lead to unpredictable behavior. |
|
|
|
|
|
## βοΈ Ethical Considerations |
|
|
|
|
|
- **Privacy:** Ensure that users are aware of and consent to the use of their facial data. Handle all captured data responsibly and in compliance with relevant privacy laws and regulations. |
|
|
- **Bias:** The model's performance may vary across different demographics. It's essential to train the model on a diverse dataset to minimize biases related to age, gender, ethnicity, and other factors. |
|
|
- **Misuse:** Like all facial recognition technologies, there's potential for misuse. Implement safeguards to prevent unauthorized or unethical applications of the model. |
|
|
|
|
|
## π License |
|
|
|
|
|
This project is licensed under the [MIT License](LICENSE). |
|
|
|
|
|
## π Acknowledgements |
|
|
|
|
|
- [TensorFlow](https://www.tensorflow.org/) |
|
|
- [OpenCV](https://opencv.org/) |
|
|
- [dlib](http://dlib.net/) |
|
|
- [imutils](https://github.com/jrosebr1/imutils) |
|
|
- [Hugging Face](https://huggingface.co/) |
|
|
- [Metal Performance Shaders (MPS)](https://developer.apple.com/documentation/metalperformanceshaders) |
|
|
- [Micromamba](https://mamba.readthedocs.io/en/latest/micromamba.html) |
|
|
|
|
|
--- |
|
|
|
|
|
**Feel free to reach out or contribute to enhance the capabilities of this model!** |
|
|
|
|
|
``` |