First version

Files changed (7) hide show

README.md +29 -0
collector.html +489 -0
collector.py +185 -0
dataset_converter.py +417 -0
inference.py +403 -0
readme.txt +272 -0
requirements.txt +6 -0

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+This is a first test on Huggingface to test and learn
+The project is mainly created with Claude 4, ChatGPT 4 and DeepSeek
+Create a virtual python env or use Conda.
+gaze_env
+Files:
+Creating a dataset
+==================
+collector.py
+collector.html
+When creating a dataset in the browser you will have to convert ti with
+convert.py
+Training from web based dataset
+=============
+training.py
+Training from OpenCV created dataset
+==============
+training_deepseek.py
+Inference
+==========
+inference.py
+This does not work in a wsl environment

collector.html ADDED Viewed

	@@ -0,0 +1,489 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Gaze Data Collector</title>
+    <style>
+        * {
+            margin: 0;
+            padding: 0;
+            box-sizing: border-box;
+        }
+        body {
+            font-family: Arial, sans-serif;
+            background: #000;
+            color: white;
+            overflow: hidden;
+            height: 100vh;
+        }
+        #gameContainer {
+            position: relative;
+            width: 100vw;
+            height: 100vh;
+            background: black;
+            cursor: none;
+        }
+        #cross {
+            position: absolute;
+            width: 60px;
+            height: 60px;
+            pointer-events: none;
+            z-index: 10;
+        }
+        .cross-line {
+            position: absolute;
+            background: #00ff00;
+        }
+        .cross-horizontal {
+            width: 60px;
+            height: 3px;
+            top: 50%;
+            left: 0;
+            transform: translateY(-50%);
+        }
+        .cross-vertical {
+            width: 3px;
+            height: 60px;
+            left: 50%;
+            top: 0;
+            transform: translateX(-50%);
+        }
+        .cross-center {
+            position: absolute;
+            width: 10px;
+            height: 10px;
+            background: #ff0000;
+            border-radius: 50%;
+            top: 50%;
+            left: 50%;
+            transform: translate(-50%, -50%);
+        }
+        #videoContainer {
+            position: fixed;
+            top: 50%;
+            left: 50%;
+            transform: translate(-50%, -50%);
+            width: 160px;
+            height: 120px;
+            border: 2px solid #00ff00;
+            background: rgba(0, 0, 0, 0.8);
+            z-index: 100;
+            border-radius: 8px;
+            overflow: hidden;
+        }
+        #video {
+            width: 400%;
+            height: 400%;
+            object-fit: cover;
+            transform: translate(-37.5%, -37.5%);
+        }
+        #controls {
+            position: fixed;
+            top: 20px;
+            left: 20px;
+            z-index: 100;
+            background: rgba(0, 0, 0, 0.8);
+            padding: 20px;
+            border-radius: 8px;
+            border: 1px solid #333;
+        }
+        .control-button {
+            background: #007bff;
+            color: white;
+            border: none;
+            padding: 10px 20px;
+            margin: 5px;
+            border-radius: 5px;
+            cursor: pointer;
+            font-size: 14px;
+        }
+        .control-button:hover {
+            background: #0056b3;
+        }
+        .control-button:disabled {
+            background: #666;
+            cursor: not-allowed;
+        }
+        #status {
+            margin-top: 10px;
+            font-size: 14px;
+        }
+        #timer {
+            font-size: 18px;
+            font-weight: bold;
+            color: #00ff00;
+        }
+        .instructions {
+            position: fixed;
+            top: 50%;
+            left: 50%;
+            transform: translate(-50%, -50%);
+            text-align: center;
+            font-size: 24px;
+            z-index: 50;
+            background: rgba(0, 0, 0, 0.8);
+            padding: 30px;
+            border-radius: 10px;
+            border: 1px solid #333;
+        }
+        .hidden {
+            display: none;
+        }
+        #downloadSection {
+            margin-top: 15px;
+        }
+        #downloadButton {
+            background: #28a745;
+        }
+        #downloadButton:hover {
+            background: #1e7e34;
+        }
+        .eye-guide {
+            position: absolute;
+            top: 5px;
+            left: 5px;
+            right: 5px;
+            bottom: 5px;
+            border: 1px dashed #00ff00;
+            border-radius: 4px;
+            opacity: 0.7;
+        }
+        .eye-guide::before {
+            content: "Center your face here";
+            position: absolute;
+            top: -20px;
+            left: 0;
+            font-size: 10px;
+            color: #00ff00;
+        }
+    </style>
+</head>
+<body>
+    <div id="gameContainer">
+        <div id="cross">
+            <div class="cross-line cross-horizontal"></div>
+            <div class="cross-line cross-vertical"></div>
+            <div class="cross-center"></div>
+        </div>
+        <div id="instructions" class="instructions">
+            <h2>Gaze Data Collector</h2>
+            <p>Follow the green cross with your eyes</p>
+            <p>Position your eye in the box on the top right</p>
+            <p>Press START to begin data collection</p>
+            <p>Collection will run for 5 minutes</p>
+        </div>
+    </div>
+    <div id="videoContainer">
+        <video id="video" autoplay muted playsinline></video>
+        <div class="eye-guide"></div>
+    </div>
+    <div id="controls">
+        <button id="startButton" class="control-button">START COLLECTION</button>
+        <button id="stopButton" class="control-button" disabled>STOP</button>
+        <div id="status">
+            <div>Status: <span id="statusText">Ready</span></div>
+            <div>Timer: <span id="timer">00:00</span></div>
+            <div>Frames: <span id="frameCount">0</span></div>
+        </div>
+        <div id="downloadSection" class="hidden">
+            <button id="downloadButton" class="control-button">DOWNLOAD DATA</button>
+        </div>
+    </div>
+    <script>
+        class GazeDataCollector {
+            constructor() {
+                this.screenWidth = window.innerWidth;
+                this.screenHeight = window.innerHeight;
+                this.crossSize = 30;
+                this.speed = 3;
+                this.dataPoints = [];
+                this.collecting = false;
+                this.startTime = null;
+                this.frameCount = 0;
+                this.collectionDuration = 300; // 5 minutes in seconds
+                // Cross position and movement
+                this.x = this.screenWidth / 2;
+                this.y = 50;
+                this.directionX = 1;
+                this.directionY = 1;
+                this.initializeElements();
+                this.initializeCamera();
+                this.bindEvents();
+                this.animationLoop();
+            }
+            initializeElements() {
+                this.cross = document.getElementById('cross');
+                this.video = document.getElementById('video');
+                this.startButton = document.getElementById('startButton');
+                this.stopButton = document.getElementById('stopButton');
+                this.statusText = document.getElementById('statusText');
+                this.timer = document.getElementById('timer');
+                this.frameCountEl = document.getElementById('frameCount');
+                this.instructions = document.getElementById('instructions');
+                this.downloadSection = document.getElementById('downloadSection');
+                this.downloadButton = document.getElementById('downloadButton');
+            }
+            async initializeCamera() {
+                try {
+                    const stream = await navigator.mediaDevices.getUserMedia({
+                        video: {
+                            width: 640,
+                            height: 480,
+                            facingMode: 'user'
+                        }
+                    });
+                    this.video.srcObject = stream;
+                    this.statusText.textContent = 'Camera ready';
+                } catch (error) {
+                    console.error('Error accessing camera:', error);
+                    this.statusText.textContent = 'Camera error';
+                    alert('Unable to access camera. Please ensure you have granted camera permissions.');
+                }
+            }
+            bindEvents() {
+                this.startButton.addEventListener('click', () => this.startCollection());
+                this.stopButton.addEventListener('click', () => this.stopCollection());
+                this.downloadButton.addEventListener('click', () => this.downloadData());
+                // Handle window resize
+                window.addEventListener('resize', () => {
+                    this.screenWidth = window.innerWidth;
+                    this.screenHeight = window.innerHeight;
+                });
+                // Keyboard shortcuts
+                document.addEventListener('keydown', (e) => {
+                    if (e.code === 'Space') {
+                        e.preventDefault();
+                        if (!this.collecting) {
+                            this.startCollection();
+                        }
+                    } else if (e.code === 'Escape') {
+                        this.stopCollection();
+                    }
+                });
+            }
+            updateCrossPosition() {
+                // Move horizontally
+                this.x += this.directionX * this.speed;
+                // Check horizontal bounds and move down when reaching edge
+                if (this.x <= this.crossSize || this.x >= this.screenWidth - this.crossSize) {
+                    this.directionX *= -1; // Reverse horizontal direction
+                    this.y += 50; // Move down by 50 pixels
+                }
+                // Check if we've reached the bottom
+                if (this.y >= this.screenHeight - this.crossSize) {
+                    this.y = this.crossSize; // Reset to top
+                }
+                // Ensure cross stays within bounds
+                this.x = Math.max(this.crossSize, Math.min(this.x, this.screenWidth - this.crossSize));
+                this.y = Math.max(this.crossSize, Math.min(this.y, this.screenHeight - this.crossSize));
+            }
+            drawCross() {
+                this.cross.style.left = (this.x - 30) + 'px';
+                this.cross.style.top = (this.y - 30) + 'px';
+            }
+            captureFrame() {
+                // Create a canvas to capture the cropped video frame
+                const canvas = document.createElement('canvas');
+                const ctx = canvas.getContext('2d');
+                // Set final output size to 60x80 pixels
+                canvas.width = 60;
+                canvas.height = 80;
+                // Get video dimensions
+                const videoWidth = this.video.videoWidth || 640;
+                const videoHeight = this.video.videoHeight || 480;
+                // Calculate crop area for face region (center area)
+                // We want to crop approximately the center 25% of the video for tight face crop
+                const cropWidth = videoWidth * 0.25;
+                const cropHeight = videoHeight * 0.25;
+                const cropX = (videoWidth - cropWidth) / 2;
+                const cropY = (videoHeight - cropHeight) / 2;
+                // Draw the cropped area scaled to 60x80
+                ctx.drawImage(
+                    this.video,
+                    cropX, cropY, cropWidth, cropHeight,  // Source crop area
+                    0, 0, 60, 80                          // Destination size
+                );
+                // Convert to base64 image data with higher compression
+                const imageData = canvas.toDataURL('image/jpeg', 0.6);
+                return imageData;
+            }
+            collectDataPoint() {
+                if (!this.collecting) return;
+                const currentTime = Date.now();
+                const timestamp = (currentTime - this.startTime) / 1000; // Convert to seconds
+                // Capture video frame
+                const imageData = this.captureFrame();
+                // Store data point
+                const dataPoint = {
+                    frame: this.frameCount,
+                    timestamp: timestamp,
+                    screen_x: this.x,
+                    screen_y: this.y,
+                    screen_width: this.screenWidth,
+                    screen_height: this.screenHeight,
+                    image_data: imageData
+                };
+                this.dataPoints.push(dataPoint);
+                this.frameCount++;
+                // Update UI
+                this.frameCountEl.textContent = this.frameCount;
+                // Check if collection time is up
+                if (timestamp >= this.collectionDuration) {
+                    this.stopCollection();
+                    alert('5 minutes completed! Data collection finished.');
+                }
+            }
+            startCollection() {
+                this.collecting = true;
+                this.startTime = Date.now();
+                this.frameCount = 0;
+                this.dataPoints = [];
+                this.startButton.disabled = true;
+                this.stopButton.disabled = false;
+                this.statusText.textContent = 'Collecting...';
+                this.instructions.classList.add('hidden');
+                console.log('Data collection started!');
+            }
+            stopCollection() {
+                this.collecting = false;
+                this.startButton.disabled = false;
+                this.stopButton.disabled = true;
+                this.statusText.textContent = 'Collection stopped';
+                this.downloadSection.classList.remove('hidden');
+                console.log('Data collection stopped!');
+                console.log(`Total frames collected: ${this.frameCount}`);
+            }
+            updateTimer() {
+                if (!this.collecting) return;
+                const elapsed = (Date.now() - this.startTime) / 1000;
+                const remaining = Math.max(0, this.collectionDuration - elapsed);
+                const minutes = Math.floor(elapsed / 60);
+                const seconds = Math.floor(elapsed % 60);
+                this.timer.textContent = `${minutes.toString().padStart(2, '0')}:${seconds.toString().padStart(2, '0')}`;
+            }
+            downloadData() {
+                if (this.dataPoints.length === 0) {
+                    alert('No data to download!');
+                    return;
+                }
+                // Create metadata
+                const metadata = {
+                    screen_width: this.screenWidth,
+                    screen_height: this.screenHeight,
+                    cross_size: this.crossSize,
+                    speed: this.speed,
+                    total_frames: this.dataPoints.length,
+                    collection_duration: this.collectionDuration,
+                    timestamp: new Date().toISOString(),
+                    data_points: this.dataPoints
+                };
+                // Convert to JSON
+                const jsonData = JSON.stringify(metadata, null, 2);
+                // Create download link
+                const blob = new Blob([jsonData], { type: 'application/json' });
+                const url = URL.createObjectURL(blob);
+                const a = document.createElement('a');
+                a.href = url;
+                a.download = `gaze_data_${new Date().toISOString().replace(/[:.]/g, '-')}.json`;
+                document.body.appendChild(a);
+                a.click();
+                document.body.removeChild(a);
+                URL.revokeObjectURL(url);
+                console.log('Data downloaded successfully!');
+            }
+            animationLoop() {
+                // Update cross position if collecting
+                if (this.collecting) {
+                    this.updateCrossPosition();
+                    this.collectDataPoint();
+                }
+                // Always draw the cross
+                this.drawCross();
+                // Update timer
+                this.updateTimer();
+                // Continue animation loop
+                requestAnimationFrame(() => this.animationLoop());
+            }
+        }
+        // Initialize the collector when page loads
+        window.addEventListener('load', () => {
+            new GazeDataCollector();
+        });
+    </script>
+</body>
+</html>

collector.py ADDED Viewed

	@@ -0,0 +1,185 @@

+import cv2
+import numpy as np
+import os
+import json
+import time
+from datetime import datetime
+class GazeDataCollector:
+    def __init__(self, screen_width=1920, screen_height=1080, cross_size=30, speed=5):
+        """
+        Initialize the gaze data collector.
+        Args:
+            screen_width: Width of the screen
+            screen_height: Height of the screen
+            cross_size: Size of the cross marker
+            speed: Speed of cross movement (pixels per frame)
+        """
+        self.screen_width = screen_width
+        self.screen_height = screen_height
+        self.cross_size = cross_size
+        self.speed = speed
+        self.data_points = []
+        self.collecting = False
+        # Create output directory
+        self.output_dir = f"gaze_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+        os.makedirs(self.output_dir, exist_ok=True)
+        os.makedirs(os.path.join(self.output_dir, "images"), exist_ok=True)
+        # Initialize camera
+        self.cap = cv2.VideoCapture(0)
+        self.cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
+        self.cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
+        # Cross position
+        self.x = screen_width // 2
+        self.y = 0
+        self.direction_x = 1
+        self.direction_y = 1
+        # Create fullscreen window
+        cv2.namedWindow('Gaze Target', cv2.WINDOW_NORMAL)
+        cv2.setWindowProperty('Gaze Target', cv2.WND_PROP_FULLSCREEN, cv2.WINDOW_FULLSCREEN)
+    def draw_cross(self, img, x, y):
+        """Draw a cross at the specified position."""
+        # Ensure cross stays within screen bounds
+        x = max(self.cross_size, min(x, self.screen_width - self.cross_size))
+        y = max(self.cross_size, min(y, self.screen_height - self.cross_size))
+        # Draw cross
+        cv2.line(img, (x - self.cross_size, y), (x + self.cross_size, y), (0, 255, 0), 3)
+        cv2.line(img, (x, y - self.cross_size), (x, y + self.cross_size), (0, 255, 0), 3)
+        # Draw center dot
+        cv2.circle(img, (x, y), 5, (0, 0, 255), -1)
+        return x, y
+    def update_position(self):
+        """Update cross position with continuous movement pattern."""
+        # Move horizontally
+        self.x += self.direction_x * self.speed
+        # Check horizontal bounds and move down when reaching edge
+        if self.x <= self.cross_size or self.x >= self.screen_width - self.cross_size:
+            self.direction_x *= -1  # Reverse horizontal direction
+            self.y += 50  # Move down by 50 pixels
+        # Check if we've reached the bottom
+        if self.y >= self.screen_height - self.cross_size:
+            self.y = self.cross_size  # Reset to top
+    def collect_data(self):
+        """Main data collection loop."""
+        print("Press SPACE to start data collection")
+        print("Press ESC to stop and save data")
+        print("The cross will move across the screen in a pattern")
+        print("Please follow the cross with your eyes")
+        frame_count = 0
+        start_time = None
+        while True:
+            # Create black background
+            screen = np.zeros((self.screen_height, self.screen_width, 3), dtype=np.uint8)
+            # Update and draw cross
+            if self.collecting:
+                self.update_position()
+            self.x, self.y = self.draw_cross(screen, self.x, self.y)
+            # Capture webcam frame
+            ret, frame = self.cap.read()
+            if not ret:
+                print("Failed to capture frame")
+                continue
+            # Show instructions when not collecting
+            if not self.collecting:
+                cv2.putText(screen, "Press SPACE to start", (self.screen_width//2 - 200, self.screen_height//2),
+                           cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
+            else:
+                # Save data point
+                timestamp = time.time() - start_time
+                image_filename = f"frame_{frame_count:06d}.jpg"
+                image_path = os.path.join(self.output_dir, "images", image_filename)
+                # Save image
+                cv2.imwrite(image_path, frame)
+                # Store data point
+                self.data_points.append({
+                    "frame": frame_count,
+                    "timestamp": timestamp,
+                    "screen_x": self.x,
+                    "screen_y": self.y,
+                    "image": image_filename
+                })
+                frame_count += 1
+                # Show progress
+                elapsed = timestamp
+                remaining = max(0, 300 - elapsed)  # 5 minutes = 300 seconds
+                cv2.putText(screen, f"Time: {elapsed:.1f}s / 300s", (10, 30),
+                           cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
+                # Stop after 5 minutes
+                if elapsed >= 300:
+                    print("5 minutes reached. Stopping collection.")
+                    break
+            # Display the screen
+            cv2.imshow('Gaze Target', screen)
+            # Handle key presses
+            key = cv2.waitKey(1) & 0xFF
+            if key == 27:  # ESC
+                break
+            elif key == 32:  # SPACE
+                if not self.collecting:
+                    self.collecting = True
+                    start_time = time.time()
+                    print("Data collection started!")
+        # Cleanup
+        self.cap.release()
+        cv2.destroyAllWindows()
+        # Save metadata
+        self.save_metadata()
+    def save_metadata(self):
+        """Save collected data points to JSON file."""
+        metadata = {
+            "screen_width": self.screen_width,
+            "screen_height": self.screen_height,
+            "cross_size": self.cross_size,
+            "speed": self.speed,
+            "total_frames": len(self.data_points),
+            "data_points": self.data_points
+        }
+        with open(os.path.join(self.output_dir, "metadata.json"), 'w') as f:
+            json.dump(metadata, f, indent=2)
+        print(f"\nData collection complete!")
+        print(f"Total frames collected: {len(self.data_points)}")
+        print(f"Data saved to: {self.output_dir}")
+if __name__ == "__main__":
+    # Create data collector with default screen resolution
+    # Adjust these values to match your screen
+    collector = GazeDataCollector(
+        screen_width=1920,
+        screen_height=1080,
+        cross_size=30,
+        speed=5
+    )
+    # Start data collection
+    collector.collect_data()

dataset_converter.py ADDED Viewed

	@@ -0,0 +1,417 @@

+#!/usr/bin/env python3
+"""
+Gaze Dataset Converter for TensorFlow Training
+This script converts the collected gaze data from the web collector
+into TensorFlow-compatible datasets for training neural networks.
+Usage:
+    python dataset_converter.py --input data_folder --output processed_dataset
+    python dataset_converter.py --json gaze_data.json --output my_dataset
+"""
+import os
+import json
+import base64
+import argparse
+import numpy as np
+from PIL import Image
+import io
+import tensorflow as tf
+from sklearn.model_selection import train_test_split
+from sklearn.preprocessing import StandardScaler
+import matplotlib.pyplot as plt
+import cv2
+from pathlib import Path
+import logging
+# Setup logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+class GazeDatasetConverter:
+    def __init__(self, output_dir="processed_dataset", test_size=0.2, val_size=0.1):
+        """
+        Initialize the dataset converter.
+        Args:
+            output_dir: Directory to save processed dataset
+            test_size: Proportion of data for testing (0.2 = 20%)
+            val_size: Proportion of data for validation (0.1 = 10%)
+        """
+        self.output_dir = Path(output_dir)
+        self.test_size = test_size
+        self.val_size = val_size
+        self.image_size = (80, 60)  # Height, Width (as expected by the collector)
+        # Create output directories
+        self.output_dir.mkdir(exist_ok=True)
+        (self.output_dir / "images" / "train").mkdir(parents=True, exist_ok=True)
+        (self.output_dir / "images" / "val").mkdir(parents=True, exist_ok=True)
+        (self.output_dir / "images" / "test").mkdir(parents=True, exist_ok=True)
+        (self.output_dir / "arrays").mkdir(exist_ok=True)
+        logger.info(f"Output directory: {self.output_dir}")
+    def load_json_data(self, json_path):
+        """Load gaze data from JSON file."""
+        logger.info(f"Loading data from {json_path}")
+        with open(json_path, 'r') as f:
+            data = json.load(f)
+        logger.info(f"Loaded {data.get('total_frames', 0)} frames")
+        logger.info(f"Screen dimensions: {data.get('screen_width')}x{data.get('screen_height')}")
+        return data
+    def process_multiple_files(self, data_folder):
+        """Process multiple JSON files from a folder."""
+        data_folder = Path(data_folder)
+        json_files = list(data_folder.glob("*.json"))
+        if not json_files:
+            raise ValueError(f"No JSON files found in {data_folder}")
+        logger.info(f"Found {len(json_files)} JSON files")
+        all_data_points = []
+        metadata = None
+        for json_file in json_files:
+            logger.info(f"Processing {json_file.name}")
+            data = self.load_json_data(json_file)
+            if metadata is None:
+                metadata = {
+                    'screen_width': data.get('screen_width'),
+                    'screen_height': data.get('screen_height'),
+                    'image_dimensions': '60x80 pixels',
+                    'total_files': len(json_files)
+                }
+            all_data_points.extend(data.get('data_points', []))
+        return {'data_points': all_data_points, **metadata}
+    def decode_and_process_image(self, base64_data, index):
+        """
+        Decode base64 image and return numpy array (no saving during processing).
+        Args:
+            base64_data: Base64 encoded image string
+            index: Frame index for logging
+        Returns:
+            numpy array of the image
+        """
+        try:
+            # Remove data URL prefix if present
+            if ',' in base64_data:
+                base64_data = base64_data.split(',')[1]
+            # Decode base64
+            image_bytes = base64.b64decode(base64_data)
+            # Convert to PIL Image
+            image = Image.open(io.BytesIO(image_bytes))
+            # Convert to RGB if needed
+            if image.mode != 'RGB':
+                image = image.convert('RGB')
+            # Resize to ensure consistent dimensions
+            image = image.resize((self.image_size[1], self.image_size[0]))  # PIL uses (width, height)
+            # Convert to numpy array and normalize
+            image_array = np.array(image, dtype=np.float32) / 255.0
+            return image_array
+        except Exception as e:
+            logger.error(f"Error processing image {index}: {e}")
+            return None
+    def normalize_coordinates(self, x, y, screen_width, screen_height):
+        """Normalize screen coordinates to [0, 1] range."""
+        norm_x = x / screen_width
+        norm_y = y / screen_height
+        # Clamp to [0, 1] range
+        norm_x = np.clip(norm_x, 0.0, 1.0)
+        norm_y = np.clip(norm_y, 0.0, 1.0)
+        return norm_x, norm_y
+    def convert_dataset(self, data):
+        """Convert the gaze data to training format."""
+        logger.info("Converting dataset...")
+        images = []
+        gaze_points = []
+        timestamps = []
+        screen_width = data.get('screen_width', 1920)
+        screen_height = data.get('screen_height', 1080)
+        data_points = data.get('data_points', [])
+        for i, point in enumerate(data_points):
+            if i % 100 == 0:
+                logger.info(f"Processing frame {i}/{len(data_points)}")
+            # Process image
+            image_array = self.decode_and_process_image(
+                point.get('image_data', ''), i
+            )
+            if image_array is not None:
+                # Normalize gaze coordinates
+                norm_x, norm_y = self.normalize_coordinates(
+                    point.get('screen_x', 0),
+                    point.get('screen_y', 0),
+                    screen_width,
+                    screen_height
+                )
+                images.append(image_array)
+                gaze_points.append([norm_x, norm_y])
+                timestamps.append(point.get('timestamp', 0))
+        # Convert to numpy arrays
+        images = np.array(images, dtype=np.float32)
+        gaze_points = np.array(gaze_points, dtype=np.float32)
+        timestamps = np.array(timestamps, dtype=np.float32)
+        logger.info(f"Processed {len(images)} valid frames")
+        logger.info(f"Image shape: {images.shape}")
+        logger.info(f"Gaze points shape: {gaze_points.shape}")
+        return images, gaze_points, timestamps
+    def split_dataset(self, images, gaze_points, timestamps):
+        """Split dataset into train/validation/test sets."""
+        logger.info("Splitting dataset...")
+        # First split: separate test set
+        X_temp, X_test, y_temp, y_test, t_temp, t_test = train_test_split(
+            images, gaze_points, timestamps,
+            test_size=self.test_size,
+            random_state=42,
+            stratify=None
+        )
+        # Second split: separate train and validation from remaining data
+        val_size_adjusted = self.val_size / (1 - self.test_size)
+        X_train, X_val, y_train, y_val, t_train, t_val = train_test_split(
+            X_temp, y_temp, t_temp,
+            test_size=val_size_adjusted,
+            random_state=42,
+            stratify=None
+        )
+        logger.info(f"Train set: {len(X_train)} samples")
+        logger.info(f"Validation set: {len(X_val)} samples")
+        logger.info(f"Test set: {len(X_test)} samples")
+        return (X_train, y_train, t_train), \
+               (X_val, y_val, t_val), \
+               (X_test, y_test, t_test)
+    def save_images_by_split(self, train_data, val_data, test_data):
+        """Save images organized by split."""
+        logger.info("Saving images by split...")
+        splits = [
+            (train_data, 'train'),
+            (val_data, 'val'),
+            (test_data, 'test')
+        ]
+        for (X, y, t), split_name in splits:
+            split_dir = self.output_dir / "images" / split_name
+            for i, image_array in enumerate(X):
+                # Convert back to PIL Image (denormalize)
+                image_array_uint8 = (image_array * 255).astype(np.uint8)
+                image = Image.fromarray(image_array_uint8)
+                # Save image
+                image_filename = f"face_{i:06d}.jpg"
+                image_path = split_dir / image_filename
+                image.save(image_path, quality=85)
+            logger.info(f"Saved {len(X)} images to {split_name} set")
+    def save_numpy_arrays(self, train_data, val_data, test_data, metadata):
+        """Save dataset as numpy arrays for fast loading."""
+        logger.info("Saving numpy arrays...")
+        arrays_dir = self.output_dir / "arrays"
+        # Save each split
+        splits = [
+            (train_data, 'train'),
+            (val_data, 'val'),
+            (test_data, 'test')
+        ]
+        for (X, y, t), split_name in splits:
+            np.save(arrays_dir / f"{split_name}_images.npy", X)
+            np.save(arrays_dir / f"{split_name}_gaze.npy", y)
+            np.save(arrays_dir / f"{split_name}_timestamps.npy", t)
+            logger.info(f"Saved {split_name} set: {X.shape[0]} samples")
+        # Save metadata
+        with open(arrays_dir / "metadata.json", 'w') as f:
+            json.dump(metadata, f, indent=2)
+    def create_tensorflow_datasets(self, train_data, val_data, test_data, batch_size=32):
+        """Create TensorFlow datasets."""
+        logger.info("Creating TensorFlow datasets...")
+        def create_tf_dataset(X, y, batch_size, shuffle=True):
+            dataset = tf.data.Dataset.from_tensor_slices((X, y))
+            if shuffle:
+                dataset = dataset.shuffle(buffer_size=1000)
+            dataset = dataset.batch(batch_size)
+            dataset = dataset.prefetch(tf.data.AUTOTUNE)
+            return dataset
+        train_dataset = create_tf_dataset(train_data[0], train_data[1], batch_size, shuffle=True)
+        val_dataset = create_tf_dataset(val_data[0], val_data[1], batch_size, shuffle=False)
+        test_dataset = create_tf_dataset(test_data[0], test_data[1], batch_size, shuffle=False)
+        # Save datasets
+        tf.data.Dataset.save(train_dataset, str(self.output_dir / "tf_datasets" / "train"))
+        tf.data.Dataset.save(val_dataset, str(self.output_dir / "tf_datasets" / "val"))
+        tf.data.Dataset.save(test_dataset, str(self.output_dir / "tf_datasets" / "test"))
+        logger.info("TensorFlow datasets saved")
+        return train_dataset, val_dataset, test_dataset
+    def visualize_samples(self, train_data, num_samples=9):
+        """Create visualization of sample data."""
+        logger.info("Creating sample visualization...")
+        X_train, y_train = train_data[0], train_data[1]
+        fig, axes = plt.subplots(3, 3, figsize=(12, 12))
+        fig.suptitle('Sample Training Data', fontsize=16)
+        indices = np.random.choice(len(X_train), num_samples, replace=False)
+        for i, idx in enumerate(indices):
+            row, col = i // 3, i % 3
+            ax = axes[row, col]
+            # Show image
+            ax.imshow(X_train[idx])
+            ax.set_title(f'Gaze: ({y_train[idx][0]:.3f}, {y_train[idx][1]:.3f})')
+            ax.axis('off')
+            # Add gaze point visualization
+            gaze_x = y_train[idx][0] * X_train[idx].shape[1]
+            gaze_y = y_train[idx][1] * X_train[idx].shape[0]
+            ax.plot(gaze_x, gaze_y, 'r+', markersize=10, markeredgewidth=2)
+        plt.tight_layout()
+        plt.savefig(self.output_dir / "sample_visualization.png", dpi=150, bbox_inches='tight')
+        plt.close()
+        logger.info("Sample visualization saved")
+    def generate_report(self, metadata, train_data, val_data, test_data):
+        """Generate a summary report."""
+        X_train, y_train = train_data[0], train_data[1]
+        X_val, y_val = val_data[0], val_data[1]
+        X_test, y_test = test_data[0], test_data[1]
+        report = {
+            'dataset_info': {
+                'total_samples': len(X_train) + len(X_val) + len(X_test),
+                'train_samples': len(X_train),
+                'val_samples': len(X_val),
+                'test_samples': len(X_test),
+                'image_shape': X_train.shape[1:],
+                'gaze_range': {
+                    'x_min': float(np.min(y_train[:, 0])),
+                    'x_max': float(np.max(y_train[:, 0])),
+                    'y_min': float(np.min(y_train[:, 1])),
+                    'y_max': float(np.max(y_train[:, 1]))
+                }
+            },
+            'original_metadata': metadata,
+            'file_structure': {
+                'arrays/': 'Numpy arrays for fast loading',
+                'images/train/': 'Training images',
+                'images/val/': 'Validation images',
+                'images/test/': 'Test images',
+                'tf_datasets/': 'TensorFlow datasets'
+            }
+        }
+        with open(self.output_dir / "dataset_report.json", 'w') as f:
+            json.dump(report, f, indent=2)
+        logger.info("Dataset report generated")
+        return report
+def main():
+    parser = argparse.ArgumentParser(description='Convert gaze data to TensorFlow dataset')
+    parser.add_argument('--input', required=True, help='Input JSON file or folder with JSON files')
+    parser.add_argument('--output', default='processed_dataset', help='Output directory')
+    parser.add_argument('--test-size', type=float, default=0.2, help='Test set proportion')
+    parser.add_argument('--val-size', type=float, default=0.1, help='Validation set proportion')
+    parser.add_argument('--batch-size', type=int, default=32, help='Batch size for TensorFlow datasets')
+    parser.add_argument('--visualize', action='store_true', help='Create sample visualizations')
+    args = parser.parse_args()
+    # Initialize converter
+    converter = GazeDatasetConverter(
+        output_dir=args.output,
+        test_size=args.test_size,
+        val_size=args.val_size
+    )
+    # Load data
+    if os.path.isfile(args.input):
+        data = converter.load_json_data(args.input)
+    elif os.path.isdir(args.input):
+        data = converter.process_multiple_files(args.input)
+    else:
+        raise ValueError(f"Input path {args.input} is neither a file nor directory")
+    # Convert dataset
+    images, gaze_points, timestamps = converter.convert_dataset(data)
+    if len(images) == 0:
+        logger.error("No valid data found!")
+        return
+    # Split dataset
+    train_data, val_data, test_data = converter.split_dataset(
+        images, gaze_points, timestamps
+    )
+    # Save everything
+    converter.save_images_by_split(train_data, val_data, test_data)
+    converter.save_numpy_arrays(train_data, val_data, test_data, data)
+    converter.create_tensorflow_datasets(train_data, val_data, test_data, args.batch_size)
+    # Optional visualization
+    if args.visualize:
+        converter.visualize_samples(train_data)
+    # Generate report
+    report = converter.generate_report(data, train_data, val_data, test_data)
+    logger.info("Dataset conversion complete!")
+    logger.info(f"Output directory: {converter.output_dir}")
+    logger.info(f"Total samples: {report['dataset_info']['total_samples']}")
+if __name__ == "__main__":
+    main()

inference.py ADDED Viewed

	@@ -0,0 +1,403 @@

+import cv2
+import numpy as np
+import tensorflow as tf
+from tensorflow import keras
+import time
+import json
+from pathlib import Path
+class GazeEstimator:
+    def __init__(self, model_path='best_gaze_model.keras', config_path='model_config.json',
+                 input_shape=None, screen_width=None, screen_height=None):
+        """
+        Initialize the real-time gaze estimator.
+        Args:
+            model_path: Path to the trained model (.keras file)
+            config_path: Path to model config JSON file
+            input_shape: Input shape expected by the model (overrides config)
+            screen_width: Width of the screen (overrides config)
+            screen_height: Height of the screen (overrides config)
+        """
+        # Load config if available
+        if Path(config_path).exists():
+            with open(config_path, 'r') as f:
+                config = json.load(f)
+                self.input_shape = tuple(config.get('input_shape', [60, 80, 3]))
+                self.screen_width = config.get('screen_width', 1920)
+                self.screen_height = config.get('screen_height', 1080)
+                print(f"Loaded config: {config}")
+        else:
+            self.input_shape = (60, 80, 3)
+            self.screen_width = 1920
+            self.screen_height = 1080
+        # Override with provided values if any
+        if input_shape is not None:
+            self.input_shape = input_shape
+        if screen_width is not None:
+            self.screen_width = screen_width
+        if screen_height is not None:
+            self.screen_height = screen_height
+        print(f"Using input shape: {self.input_shape}")
+        print(f"Using screen dimensions: {self.screen_width}x{self.screen_height}")
+        # Load model
+        print(f"Loading model from {model_path}...")
+        self.model = keras.models.load_model(model_path)
+        print("Model loaded successfully!")
+        # Initialize camera
+        self.cap = cv2.VideoCapture(0)
+        self.cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
+        self.cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
+        # For FPS calculation
+        self.prev_time = 0
+        # Smoothing parameters
+        self.smooth_x = screen_width // 2
+        self.smooth_y = screen_height // 2
+        self.smoothing_factor = 0.3
+    def preprocess_frame(self, frame):
+        """Preprocess frame for model input."""
+        # Resize to model input size
+        resized = cv2.resize(frame, (self.input_shape[1], self.input_shape[0]))
+        # Convert to RGB
+        rgb = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB)
+        # Normalize to [0, 1]
+        normalized = rgb.astype('float32') / 255.0
+        # Add batch dimension
+        batch = np.expand_dims(normalized, axis=0)
+        return batch, resized  # Return resized image for visualization
+    def smooth_prediction(self, pred_x, pred_y):
+        """Apply exponential smoothing to predictions."""
+        self.smooth_x = self.smoothing_factor * pred_x + (1 - self.smoothing_factor) * self.smooth_x
+        self.smooth_y = self.smoothing_factor * pred_y + (1 - self.smoothing_factor) * self.smooth_y
+        return int(self.smooth_x), int(self.smooth_y)
+    def run_inference(self):
+        """Run real-time gaze estimation."""
+        # Create visualization window
+        cv2.namedWindow('Gaze Estimation', cv2.WINDOW_NORMAL)
+        cv2.resizeWindow('Gaze Estimation', 1400, 800)
+        print("\nStarting real-time gaze estimation...")
+        print("Controls:")
+        print("  'q' - Quit")
+        print("  's' - Toggle smoothing")
+        print("  'r' - Reset smoothing")
+        print("  '+' - Increase smoothing factor")
+        print("  '-' - Decrease smoothing factor")
+        use_smoothing = True
+        while True:
+            # Capture frame
+            ret, frame = self.cap.read()
+            if not ret:
+                continue
+            # Preprocess frame
+            input_batch, preprocessed_img = self.preprocess_frame(frame)
+            # Run inference
+            start_inference = time.time()
+            predictions = self.model.predict(input_batch, verbose=0)
+            inference_time = (time.time() - start_inference) * 1000  # ms
+            # Extract coordinates
+            norm_x, norm_y = predictions[0]
+            pred_x = int(norm_x * self.screen_width)
+            pred_y = int(norm_y * self.screen_height)
+            # Clamp predictions to screen bounds
+            pred_x = max(0, min(pred_x, self.screen_width - 1))
+            pred_y = max(0, min(pred_y, self.screen_height - 1))
+            # Apply smoothing if enabled
+            if use_smoothing:
+                gaze_x, gaze_y = self.smooth_prediction(pred_x, pred_y)
+            else:
+                gaze_x, gaze_y = pred_x, pred_y
+            # Create visualization
+            vis_frame = self.create_visualization(frame, preprocessed_img, gaze_x, gaze_y,
+                                                inference_time, use_smoothing)
+            # Calculate FPS
+            current_time = time.time()
+            fps = 1 / (current_time - self.prev_time) if self.prev_time > 0 else 0
+            self.prev_time = current_time
+            # Add FPS to visualization
+            cv2.putText(vis_frame, f"FPS: {fps:.1f}", (10, 30),
+                       cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
+            # Show frame
+            cv2.imshow('Gaze Estimation', vis_frame)
+            # Handle key presses
+            key = cv2.waitKey(1) & 0xFF
+            if key == ord('q'):
+                break
+            elif key == ord('s'):
+                use_smoothing = not use_smoothing
+                print(f"Smoothing: {'ON' if use_smoothing else 'OFF'}")
+            elif key == ord('r'):
+                self.smooth_x = self.screen_width // 2
+                self.smooth_y = self.screen_height // 2
+                print("Smoothing reset")
+            elif key == ord('+'):
+                self.smoothing_factor = min(0.9, self.smoothing_factor + 0.1)
+                print(f"Smoothing factor: {self.smoothing_factor:.1f}")
+            elif key == ord('-'):
+                self.smoothing_factor = max(0.1, self.smoothing_factor - 0.1)
+                print(f"Smoothing factor: {self.smoothing_factor:.1f}")
+        # Cleanup
+        self.cap.release()
+        cv2.destroyAllWindows()
+    def create_visualization(self, frame, preprocessed_img, gaze_x, gaze_y,
+                           inference_time, use_smoothing):
+        """Create visualization frame with gaze overlay and preprocessed image."""
+        # Create a larger canvas
+        canvas_height = 800
+        canvas_width = 1400
+        canvas = np.zeros((canvas_height, canvas_width, 3), dtype=np.uint8)
+        # 1. Original webcam feed (top left)
+        cam_height = 360
+        cam_width = 480
+        resized_frame = cv2.resize(frame, (cam_width, cam_height))
+        canvas[20:20+cam_height, 20:20+cam_width] = resized_frame
+        cv2.putText(canvas, "Original Webcam", (20, 15),
+                   cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 1)
+        # 2. Preprocessed image sent to network (bottom left)
+        # Scale up the preprocessed image for better visibility
+        prep_scale = 3
+        prep_height = self.input_shape[0] * prep_scale
+        prep_width = self.input_shape[1] * prep_scale
+        # Convert back to BGR for display
+        prep_display = cv2.cvtColor(preprocessed_img, cv2.COLOR_RGB2BGR)
+        prep_display = cv2.resize(prep_display, (prep_width, prep_height),
+                                 interpolation=cv2.INTER_NEAREST)
+        prep_y = cam_height + 60
+        canvas[prep_y:prep_y+prep_height, 20:20+prep_width] = prep_display
+        cv2.putText(canvas, f"Network Input ({self.input_shape[0]}x{self.input_shape[1]})",
+                   (20, prep_y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 1)
+        # Draw border around preprocessed image
+        cv2.rectangle(canvas, (20, prep_y), (20+prep_width, prep_y+prep_height),
+                     (100, 100, 100), 2)
+        # 3. Screen representation (right side)
+        screen_start_x = cam_width + 60
+        screen_start_y = 20
+        screen_vis_width = 800
+        screen_vis_height = 600
+        # Draw screen border
+        cv2.rectangle(canvas,
+                     (screen_start_x, screen_start_y),
+                     (screen_start_x + screen_vis_width, screen_start_y + screen_vis_height),
+                     (255, 255, 255), 2)
+        # Draw grid on screen for reference
+        grid_color = (50, 50, 50)
+        for i in range(1, 4):
+            # Vertical lines
+            x = screen_start_x + (screen_vis_width * i) // 4
+            cv2.line(canvas, (x, screen_start_y), (x, screen_start_y + screen_vis_height),
+                    grid_color, 1)
+            # Horizontal lines
+            y = screen_start_y + (screen_vis_height * i) // 4
+            cv2.line(canvas, (screen_start_x, y), (screen_start_x + screen_vis_width, y),
+                    grid_color, 1)
+        # Scale gaze coordinates to visualization
+        vis_gaze_x = screen_start_x + int((gaze_x / self.screen_width) * screen_vis_width)
+        vis_gaze_y = screen_start_y + int((gaze_y / self.screen_height) * screen_vis_height)
+        # Draw gaze trail (if smoothing is on)
+        if use_smoothing:
+            # Draw a fading trail
+            trail_color = (0, 100, 0)
+            cv2.circle(canvas, (vis_gaze_x, vis_gaze_y), 20, trail_color, 1)
+        # Draw gaze point
+        cv2.circle(canvas, (vis_gaze_x, vis_gaze_y), 8, (0, 255, 0), -1)
+        cv2.circle(canvas, (vis_gaze_x, vis_gaze_y), 12, (0, 255, 0), 2)
+        # Draw crosshair
+        cv2.line(canvas, (vis_gaze_x - 20, vis_gaze_y), (vis_gaze_x + 20, vis_gaze_y),
+                (0, 255, 0), 1)
+        cv2.line(canvas, (vis_gaze_x, vis_gaze_y - 20), (vis_gaze_x, vis_gaze_y + 20),
+                (0, 255, 0), 1)
+        # Add title
+        cv2.putText(canvas, "Real-time Gaze Estimation",
+                   (screen_start_x + 250, screen_start_y - 5),
+                   cv2.FONT_HERSHEY_SIMPLEX, 0.8, (255, 255, 255), 2)
+        # 4. Information panel (bottom right)
+        info_y = screen_start_y + screen_vis_height + 30
+        info_x = screen_start_x
+        # Background for info panel
+        cv2.rectangle(canvas, (info_x - 10, info_y - 10),
+                     (info_x + 400, info_y + 120), (30, 30, 30), -1)
+        # Info text
+        cv2.putText(canvas, f"Gaze Position: ({gaze_x}, {gaze_y})",
+                   (info_x, info_y + 20), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 1)
+        cv2.putText(canvas, f"Normalized: ({gaze_x/self.screen_width:.3f}, {gaze_y/self.screen_height:.3f})",
+                   (info_x, info_y + 45), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 1)
+        cv2.putText(canvas, f"Inference Time: {inference_time:.1f} ms",
+                   (info_x, info_y + 70), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (255, 255, 255), 1)
+        cv2.putText(canvas, f"Smoothing: {'ON' if use_smoothing else 'OFF'} (factor: {self.smoothing_factor:.1f})",
+                   (info_x, info_y + 95), cv2.FONT_HERSHEY_SIMPLEX, 0.6,
+                   (0, 255, 0) if use_smoothing else (100, 100, 100), 1)
+        return canvas
+class TFLiteGazeEstimator(GazeEstimator):
+    def __init__(self, tflite_path='gaze_model_efficient.tflite', config_path='model_config.json',
+                 input_shape=None, screen_width=None, screen_height=None):
+        """
+        TFLite version for even faster inference.
+        """
+        # Load config if available
+        if Path(config_path).exists():
+            with open(config_path, 'r') as f:
+                config = json.load(f)
+                self.input_shape = tuple(config.get('input_shape', [60, 80, 3]))
+                self.screen_width = config.get('screen_width', 1920)
+                self.screen_height = config.get('screen_height', 1080)
+                print(f"Loaded config: {config}")
+        else:
+            self.input_shape = (60, 80, 3)
+            self.screen_width = 1920
+            self.screen_height = 1080
+        # Override with provided values if any
+        if input_shape is not None:
+            self.input_shape = input_shape
+        if screen_width is not None:
+            self.screen_width = screen_width
+        if screen_height is not None:
+            self.screen_height = screen_height
+        print(f"Using input shape: {self.input_shape}")
+        print(f"Using screen dimensions: {self.screen_width}x{self.screen_height}")
+        # Load TFLite model
+        print(f"Loading TFLite model from {tflite_path}...")
+        self.interpreter = tf.lite.Interpreter(model_path=tflite_path)
+        self.interpreter.allocate_tensors()
+        # Get input and output details
+        self.input_details = self.interpreter.get_input_details()
+        self.output_details = self.interpreter.get_output_details()
+        print(f"Input details: {self.input_details}")
+        print(f"Output details: {self.output_details}")
+        # Initialize camera
+        self.cap = cv2.VideoCapture(0)
+        self.cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
+        self.cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
+        # For FPS calculation
+        self.prev_time = 0
+        # Smoothing parameters
+        self.smooth_x = screen_width // 2
+        self.smooth_y = screen_height // 2
+        self.smoothing_factor = 0.3
+        # Parent class methods
+        self.model = None  # Dummy for compatibility
+    def predict(self, input_batch, verbose=0):
+        """Run TFLite inference."""
+        self.interpreter.set_tensor(self.input_details[0]['index'], input_batch)
+        self.interpreter.invoke()
+        output_data = self.interpreter.get_tensor(self.output_details[0]['index'])
+        return output_data
+def test_camera():
+    """Test if camera is working."""
+    print("Testing camera...")
+    cap = cv2.VideoCapture(0)
+    if not cap.isOpened():
+        print("Error: Could not open camera")
+        return False
+    ret, frame = cap.read()
+    if not ret:
+        print("Error: Could not read from camera")
+        cap.release()
+        return False
+    print(f"Camera working! Frame shape: {frame.shape}")
+    cap.release()
+    return True
+def main():
+    # Test camera first
+    if not test_camera():
+        return
+    # Check which model files are available
+    keras_model = Path('best_gaze_model.keras')
+    tflite_model = Path('gaze_model_efficient.tflite')
+    use_tflite = False
+    if tflite_model.exists() and input("Use TFLite model for faster inference? (y/n): ").lower() == 'y':
+        use_tflite = True
+    elif not keras_model.exists():
+        print(f"Error: Model file {keras_model} not found!")
+        return
+    try:
+        if use_tflite:
+            print("\nUsing TFLite model...")
+            estimator = TFLiteGazeEstimator(
+                tflite_path=str(tflite_model),
+                config_path='model_config.json'
+            )
+            # Override predict method for compatibility
+            estimator.model = estimator  # Dummy reference
+            estimator.model.predict = estimator.predict
+        else:
+            print("\nUsing Keras model...")
+            estimator = GazeEstimator(
+                model_path=str(keras_model),
+                config_path='model_config.json'
+            )
+        # Run real-time inference
+        estimator.run_inference()
+    except Exception as e:
+        print(f"Error: {e}")
+        import traceback
+        traceback.print_exc()
+if __name__ == "__main__":
+    main()

readme.txt ADDED Viewed

	@@ -0,0 +1,272 @@

+# Gaze Estimation Setup Instructions for Ubuntu 22.04
+## Prerequisites
+- Conda installed on Ubuntu 22.04
+- Webcam connected to your system
+- NVIDIA GPU (optional, for faster training)
+## Step 1, Virtual env
+## Step 1: Create Conda Environment
+```bash
+# Create a new conda environment with Python 3.9
+conda create -n gaze_estimation python=3.9 -y
+# Activate the environment
+conda activate gaze_estimation
+```
+## Step 2: Install Core Dependencies
+```bash
+# Install TensorFlow (CPU version)
+pip install tensorflow==2.13.0
+# OR for GPU support (if you have NVIDIA GPU with CUDA)
+# pip install tensorflow[and-cuda]==2.13.0
+# Install OpenCV
+pip install opencv-python==4.8.1.78
+# Install additional required packages
+pip install numpy==1.24.3
+pip install matplotlib==3.7.1
+pip install scikit-learn==1.3.0
+pip install pillow==10.0.0
+```
+## Step 3: Install System Dependencies for OpenCV
+OpenCV might need some system libraries on Ubuntu 22.04:
+```bash
+# Update package list
+sudo apt update
+# Install required system libraries
+sudo apt install -y libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1
+# Install video codecs and camera support
+sudo apt install -y libgstreamer1.0-0 libgstreamer-plugins-base1.0-0 v4l-utils
+# Install additional GUI libraries for OpenCV windows
+sudo apt install -y libgtk-3-0 libgtk-3-dev
+```
+## Step 4: Verify Installation
+Create a test script `test_installation.py`:
+```python
+import cv2
+import tensorflow as tf
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+print("OpenCV version:", cv2.__version__)
+print("TensorFlow version:", tf.__version__)
+print("NumPy version:", np.__version__)
+# Test camera
+cap = cv2.VideoCapture(0)
+if cap.isOpened():
+    print("Camera is accessible")
+    ret, frame = cap.read()
+    if ret:
+        print("Camera capture successful")
+    cap.release()
+else:
+    print("Camera not found")
+# Test TensorFlow
+print("TensorFlow GPU available:", tf.config.list_physical_devices('GPU'))
+```
+Run the test:
+```bash
+python test_installation.py
+```
+## Step 5: Camera Permissions
+If you have camera permission issues:
+```bash
+# Add your user to the video group
+sudo usermod -a -G video $USER
+# Log out and log back in, or run:
+newgrp video
+# Check camera devices
+ls -la /dev/video*
+# Test camera with v4l2
+v4l2-ctl --list-devices
+```
+## Step 6: Download the Scripts
+Save the three Python scripts I provided:
+1. `gaze_data_collection.py` - For collecting training data
+2. `gaze_training.py` - For training the model
+3. `gaze_inference.py` - For real-time inference
+## Step 7: Configure Display Settings
+For the data collection script to work properly in fullscreen:
+```bash
+# Check your display resolution
+xrandr | grep current
+# You might need to allow OpenCV to create fullscreen windows
+# If using Wayland, you may need to switch to X11:
+echo $XDG_SESSION_TYPE  # Check if using Wayland or X11
+```
+## Step 8: Environment Variables (Optional)
+Create a `.env` file or export these for better performance:
+```bash
+# Limit TensorFlow GPU memory growth
+export TF_FORCE_GPU_ALLOW_GROWTH=true
+# Set number of threads for better CPU performance
+export OMP_NUM_THREADS=4
+export TF_NUM_INTEROP_THREADS=4
+export TF_NUM_INTRAOP_THREADS=4
+```
+## Complete Installation Script
+Here's a complete script to set up everything:
+```bash
+#!/bin/bash
+# Create and activate conda environment
+conda create -n gaze_estimation python=3.9 -y
+source $(conda info --base)/etc/profile.d/conda.sh
+conda activate gaze_estimation
+# Install Python packages
+pip install tensorflow==2.13.0
+pip install opencv-python==4.8.1.78
+pip install numpy==1.24.3
+pip install matplotlib==3.7.1
+pip install scikit-learn==1.3.0
+pip install pillow==10.0.0
+# Install system dependencies
+sudo apt update
+sudo apt install -y libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev libgomp1
+sudo apt install -y libgstreamer1.0-0 libgstreamer-plugins-base1.0-0 v4l-utils
+sudo apt install -y libgtk-3-0 libgtk-3-dev
+# Add user to video group
+sudo usermod -a -G video $USER
+echo "Setup complete! Please log out and log back in for video group changes to take effect."
+echo "Then activate the environment with: conda activate gaze_estimation"
+```
+Save this as `setup_gaze_env.sh` and run:
+```bash
+chmod +x setup_gaze_env.sh
+./setup_gaze_env.sh
+```
+## Troubleshooting
+### Camera Issues
+If the camera isn't detected:
+```bash
+# Check if camera is detected by system
+ls -la /dev/video*
+v4l2-ctl --list-devices
+# Test with simple capture
+python -c "import cv2; cap = cv2.VideoCapture(0); print('Camera opened:', cap.isOpened()); cap.release()"
+```
+### OpenCV Window Issues
+If OpenCV windows don't appear:
+```bash
+# Install additional backends
+sudo apt install -y python3-opencv libopencv-dev
+# For Wayland compatibility issues, force X11
+export GDK_BACKEND=x11
+```
+### TensorFlow Issues
+If TensorFlow has compatibility issues:
+```bash
+# Check CUDA compatibility (for GPU)
+nvidia-smi
+# Install specific CUDA version if needed
+conda install -c conda-forge cudatoolkit=11.8 cudnn=8.6
+```
+### Permission Denied Errors
+```bash
+# For camera access
+sudo chmod 666 /dev/video0
+# For display access
+xhost +local:
+```
+## Running the System
+Once everything is installed:
+1. **Collect data:**
+   ```bash
+   conda activate gaze_estimation
+   python gaze_data_collection.py
+   ```
+2. **Train model:**
+   ```bash
+   python gaze_training.py
+   ```
+3. **Run inference:**
+   ```bash
+   python gaze_inference.py
+   ```
+## Optional: Jupyter Notebook Support
+If you want to experiment with Jupyter notebooks:
+```bash
+pip install jupyter ipykernel
+python -m ipykernel install --user --name gaze_estimation --display-name "Gaze Estimation"
+```
+## Clean Environment File
+Create `requirements.txt` for easy reproduction:
+```
+tensorflow==2.13.0
+opencv-python==4.8.1.78
+numpy==1.24.3
+matplotlib==3.7.1
+scikit-learn==1.3.0
+pillow==10.0.0
+```
+Then others can install with:
+```bash
+conda create -n gaze_estimation python=3.9 -y
+conda activate gaze_estimation
+pip install -r requirements.txt
+```

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+tensorflow==2.18.1
+opencv-python==4.10.0.84
+numpy==1.26.4
+matplotlib==3.9.2
+scikit-learn==1.5.2
+pillow==10.4.0