Update README.md

243a720 verified over 1 year ago

11.7 kB

	---
	license: mit
	pipeline_tag: object-detection
	---

	# Eye and Eyebrow Movement Recognition Model

	![License](https://img.shields.io/badge/license-MIT-blue.svg)
	![Python](https://img.shields.io/badge/python-3.9%2B-blue.svg)
	![TensorFlow](https://img.shields.io/badge/tensorflow-2.8.0%2B-brightgreen.svg)

	## 📖 Table of Contents

	- [📚 Description](#-description)
	- [🔍 Features](#-features)
	- [🎯 Intended Use](#-intended-use)
	- [🧠 Model Architecture](#-model-architecture)
	- [📋 Training Data](#-training-data)
	- [📈 Evaluation](#-evaluation)
	- [💻 Usage](#-usage)
	- [Prerequisites](#prerequisites)
	- [Installation](#installation)
	- [Loading the Model](#loading-the-model)
	- [Making Predictions](#making-predictions)
	- [🔧 Limitations](#-limitations)
	- [⚖️ Ethical Considerations](#-ethical-considerations)
	- [📜 License](#-license)
	- [🙏 Acknowledgements](#-acknowledgements)

	## 📚 Description

	The Eye and Eyebrow Movement Recognition model is an advanced real-time system designed to accurately detect and classify subtle facial movements, specifically focusing on the eyes and eyebrows. Currently, the model is trained to recognize three distinct movements:

	- Yes: Characterized by the raising of eyebrows.
	- No: Indicated by the lowering of eyebrows.
	- Normal: Representing a neutral facial expression without significant eye or eyebrow movements.

	Leveraging a CNN-LSTM (Convolutional Neural Network - Long Short-Term Memory) architecture, the model effectively captures both spatial features from individual frames and temporal dynamics across sequences of frames. This ensures robust and reliable performance in real-world scenarios.

	## 🔍 Features

	- Real-Time Detection: Continuously processes live webcam feeds to detect eye and eyebrow movements without noticeable lag.
	- GPU Acceleration: Optimized for GPU usage via TensorFlow-Metal on macOS, ensuring efficient computations.
	- Extensible Design: While currently supporting "Yes," "No," and "Normal" movements, the system is designed to be easily extended to accommodate additional facial gestures or movements.
	- User-Friendly Interface: Provides visual feedback by overlaying predictions directly onto the live video feed for immediate user feedback.
	- High Accuracy: Demonstrates high accuracy in distinguishing between the supported movements, making it a reliable tool for real-time facial gesture recognition.

	## 🎯 Intended Use

	This model is ideal for a variety of applications, including but not limited to:

	- Human-Computer Interaction (HCI): Enhancing user interfaces with gesture-based controls.
	- Assistive Technologies: Providing non-verbal communication tools for individuals with speech impairments.
	- Behavioral Analysis: Monitoring and analyzing facial expressions for psychological or market research.
	- Gaming: Creating more immersive and responsive gaming experiences through facial gesture controls.

	Note: The model is intended for research and educational purposes. Ensure compliance with privacy and ethical guidelines when deploying in real-world applications.

	## 🧠 Model Architecture

	The model employs a CNN-LSTM architecture to capture both spatial and temporal features:

	1. TimeDistributed CNN Layers:
	- Conv2D: Extracts spatial features from each frame independently.
	- MaxPooling2D: Reduces spatial dimensions.
	- BatchNormalization: Stabilizes and accelerates training.

	2. Flatten Layer:
	- Flattens the output from CNN layers to prepare for LSTM processing.

	3. LSTM Layer:
	- Captures temporal dependencies across the sequence of frames.

	4. Dense Layers:
	- Fully connected layers that perform the final classification based on combined spatial-temporal features.

	5. Output Layer:
	- Softmax Activation: Provides probability distribution over the three classes ("Yes," "No," "Normal").

	## 📋 Training Data

	The model was trained on a curated dataset consisting of short video clips (1-2 seconds) capturing the three target movements:

	- Yes: 50 samples
	- No: 50 samples
	- Normal: 50 samples

	Each video was recorded using a standard webcam under varied lighting conditions and backgrounds to ensure robustness. The videos were manually labeled and organized into respective directories for preprocessing.

	## 📈 Evaluation

	The model was evaluated on a separate test set comprising 60 samples for each class. The evaluation metrics are as follows:

	- Accuracy: 85%
	- Precision: 84%
	- Recall: 86%
	- F1-Score: 85%




	## 💻 Usage

	### Prerequisites

	- Hardware: Mac with Apple Silicon (M1, M1 Pro, M1 Max, M2, etc.) for Metal GPU support.
	- Operating System: macOS 12.3 (Monterey) or newer.
	- Python: Version 3.9 or higher.

	### Installation

	1. Clone the Repository

	```bash
	git clone https://huggingface.co/shayan5422/eye-eyebrow-movement-recognition
	cd eye-eyebrow-movement-recognition
	```

	2. Install Homebrew (if not already installed)

	Homebrew is a package manager for macOS that simplifies the installation of software.

	```bash
	/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
	```

	3. Install Micromamba

	Micromamba is a lightweight package manager compatible with Conda environments.

	```bash
	brew install micromamba
	```

	4. Create and Activate a Virtual Environment

	We'll use Micromamba to create an isolated environment for our project.

	```bash
	# Create a new environment named 'eye_movement' with Python 3.9
	micromamba create -n eye_movement python=3.9

	# Activate the environment
	micromamba activate eye_movement
	```

	5. Install Required Libraries

	We'll install TensorFlow with Metal support (`tensorflow-macos` and `tensorflow-metal`) along with other necessary libraries.

	```bash
	# Install TensorFlow for macOS
	pip install tensorflow-macos

	# Install TensorFlow Metal plugin for GPU acceleration
	pip install tensorflow-metal

	# Install other dependencies
	pip install opencv-python dlib imutils tqdm scikit-learn matplotlib seaborn h5py
	```

	> Note: Installing `dlib` can sometimes be challenging on macOS. If you encounter issues, consider installing it via Conda or refer to [dlib's official installation instructions](http://dlib.net/compile.html).

	6. Download Dlib's Pre-trained Shape Predictor

	This model is essential for facial landmark detection.

	```bash
	# Navigate to your project directory
	cd /path/to/your/project/eye-eyebrow-movement-recognition/

	# Download the shape predictor
	curl -LO http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2

	# Decompress the file
	bunzip2 shape_predictor_68_face_landmarks.dat.bz2
	```

	Ensure that the `shape_predictor_68_face_landmarks.dat` file is in the same directory as your scripts.

	### Loading the Model

	```python
	import tensorflow as tf

	# Load the trained model
	model = tf.keras.models.load_model('final_model_sequences.keras')
	```

	### Making Predictions

	```python
	import cv2
	import numpy as np
	import dlib
	from imutils import face_utils
	from collections import deque
	import queue
	import threading

	# Initialize dlib's face detector and landmark predictor
	detector = dlib.get_frontal_face_detector()
	predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat')

	# Initialize queues for threading
	input_queue = queue.Queue()
	output_queue = queue.Queue()

	# Define sequence length
	max_seq_length = 30

	def prediction_worker(model, input_q, output_q):
	while True:
	sequence = input_q.get()
	if sequence is None:
	break
	# Preprocess and predict
	# [Add your prediction logic here]
	# Example:
	prediction = model.predict(sequence)
	class_idx = np.argmax(prediction)
	confidence = np.max(prediction)
	output_q.put((class_idx, confidence))

	# Start prediction thread
	thread = threading.Thread(target=prediction_worker, args=(model, input_queue, output_queue))
	thread.start()

	# Start video capture
	cap = cv2.VideoCapture(0)
	frame_buffer = deque(maxlen=max_seq_length)

	while True:
	ret, frame = cap.read()
	if not ret:
	break

	# Preprocess frame
	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
	rects = detector(gray, 1)
	if len(rects) > 0:
	rect = rects[0]
	shape = predictor(gray, rect)
	shape = face_utils.shape_to_np(shape)
	# Extract ROIs and preprocess
	# [Add your ROI extraction and preprocessing here]
	# Example:
	preprocessed_frame = preprocess_frame(frame, detector, predictor)
	frame_buffer.append(preprocessed_frame)
	else:
	frame_buffer.append(np.zeros((64, 256, 1), dtype='float32'))

	# If buffer is full, send to prediction
	if len(frame_buffer) == max_seq_length:
	sequence = np.array(frame_buffer)
	input_queue.put(np.expand_dims(sequence, axis=0))
	frame_buffer.clear()

	# Check for prediction results
	try:
	while True:
	class_idx, confidence = output_queue.get_nowait()
	movement = index_to_text.get(class_idx, "Unknown")
	text = f"{movement} ({confidence*100:.2f}%)"
	cv2.putText(frame, text, (30, 30), cv2.FONT_HERSHEY_SIMPLEX,
	0.8, (0, 255, 0), 2, cv2.LINE_AA)
	except queue.Empty:
	pass

	# Display the frame
	cv2.imshow('Real-time Movement Prediction', frame)

	# Exit on 'q' key
	if cv2.waitKey(1) & 0xFF == ord('q'):
	break

	# Cleanup
	cap.release()
	cv2.destroyAllWindows()
	input_queue.put(None)
	thread.join()
	```

	Note: Replace the placeholder comments with your actual preprocessing and prediction logic as implemented in your scripts.

	## 🔧 Limitations

	- Movement Scope: Currently, the model is limited to recognizing "Yes," "No," and "Normal" movements. Extending to additional movements would require further data collection and training.
	- Environmental Constraints: The model performs best under good lighting conditions and with a clear, frontal view of the face. Variations in lighting, occlusions, or extreme angles may affect accuracy.
	- Single Face Assumption: The system is designed to handle a single face in the frame. Multiple faces may lead to unpredictable behavior.

	## ⚖️ Ethical Considerations

	- Privacy: Ensure that users are aware of and consent to the use of their facial data. Handle all captured data responsibly and in compliance with relevant privacy laws and regulations.
	- Bias: The model's performance may vary across different demographics. It's essential to train the model on a diverse dataset to minimize biases related to age, gender, ethnicity, and other factors.
	- Misuse: Like all facial recognition technologies, there's potential for misuse. Implement safeguards to prevent unauthorized or unethical applications of the model.

	## 📜 License

	This project is licensed under the [MIT License](LICENSE).

	## 🙏 Acknowledgements

	- [TensorFlow](https://www.tensorflow.org/)
	- [OpenCV](https://opencv.org/)
	- [dlib](http://dlib.net/)
	- [imutils](https://github.com/jrosebr1/imutils)
	- [Hugging Face](https://huggingface.co/)
	- [Metal Performance Shaders (MPS)](https://developer.apple.com/documentation/metalperformanceshaders)
	- [Micromamba](https://mamba.readthedocs.io/en/latest/micromamba.html)

	---

	Feel free to reach out or contribute to enhance the capabilities of this model!

	```