Spaces:

txarst
/

pupillometry

Sleeping

App Files Files Community

txarst commited on Jul 26, 2025

Commit

f0e5caa

0 Parent(s):

model upload

Browse files

Files changed (30) hide show

.gitattributes +2 -0
LICENSE +407 -0
README.md +168 -0
app.py +40 -0
app_utils.py +906 -0
config.yml +51 -0
feature_extraction/extractor_mediapipe.py +340 -0
feature_extraction/features_extractor.py +48 -0
gradio_app.py +388 -0
gradio_utils.py +300 -0
pre_trained_models/ResNet18/left_eye.pt +3 -0
pre_trained_models/ResNet18/right_eye.pt +3 -0
pre_trained_models/ResNet50/left_eye.pt +3 -0
pre_trained_models/ResNet50/right_eye.pt +3 -0
preprocessing/dataset_creation.py +26 -0
preprocessing/dataset_creation_utils.py +14 -0
registrations/models.py +56 -0
registry.py +82 -0
registry_utils.py +79 -0
requirements.txt +28 -0
sample_videos/All Smiles Ahead.webm +3 -0
sample_videos/And it was all Yellow.webm +3 -0
sample_videos/Blink It Like Brian.webm +3 -0
sample_videos/Focus Pocus.webm +3 -0
sample_videos/Funny Talks.webm +3 -0
sample_videos/I like to move it move it.webm +3 -0
sample_videos/Infinite Blue.webm +3 -0
sample_videos/Red Ross.webm +3 -0
sample_videos/Smile, You’re on Camera!.webm +3 -0
utils.py +11 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ *.pt filter=lfs diff=lfs merge=lfs -text
2	+ *.webm filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

	@@ -0,0 +1,407 @@

+Attribution-NonCommercial 4.0 International
+=======================================================================
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+Using Creative Commons Public Licenses
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+    wiki.creativecommons.org/Considerations_for_licensors
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More considerations
+     for the public:
+    wiki.creativecommons.org/Considerations_for_licensees
+=======================================================================
+Creative Commons Attribution-NonCommercial 4.0 International Public
+License
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial 4.0 International Public License ("Public
+License"). To the extent this Public License may be interpreted as a
+contract, You are granted the Licensed Rights in consideration of Your
+acceptance of these terms and conditions, and the Licensor grants You
+such rights in consideration of benefits the Licensor receives from
+making the Licensed Material available under these terms and
+conditions.
+Section 1 -- Definitions.
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+  b. Adapter's License means the license You apply to Your Copyright
+     and Similar Rights in Your contributions to Adapted Material in
+     accordance with the terms and conditions of this Public License.
+  c. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+  d. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+  e. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+  f. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+  g. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+  h. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+  i. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+  j. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+  k. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+  l. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+Section 2 -- Scope.
+  a. License grant.
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+            b. produce, reproduce, and Share Adapted Material for
+               NonCommercial purposes only.
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+       5. Downstream recipients.
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+            b. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+  b. Other rights.
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+Section 3 -- License Conditions.
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+  a. Attribution.
+       1. If You Share the Licensed Material (including in modified
+          form), You must:
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+                ii. a copyright notice;
+               iii. a notice that refers to this Public License;
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+       4. If You Share Adapted Material You produce, the Adapter's
+          License You apply must not prevent recipients of the Adapted
+          Material from complying with this Public License.
+Section 4 -- Sui Generis Database Rights.
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only;
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material; and
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+Section 6 -- Term and Termination.
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+       2. upon express reinstatement by the Licensor.
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+Section 7 -- Other Terms and Conditions.
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+Section 8 -- Interpretation.
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+=======================================================================
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the â€œLicensor.â€ The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+Creative Commons may be contacted at creativecommons.org.

README.md ADDED Viewed

	@@ -0,0 +1,168 @@

+---
+title: PupilSense
+emoji: 👁️
+colorFrom: red
+colorTo: pink
+sdk: gradio
+sdk_version: 4.36.1
+app_file: app.py
+pinned: false
+---
+# 👁️ PupilSense 👁️🕵️‍♂️
+PupilSense is a deep learning-powered application for estimating pupil diameter from images and videos. It uses trained ResNet models with Class Activation Mapping (CAM) for interpretable predictions.
+## Features
+- **Image Processing**: Upload images to get instant pupil diameter estimates
+- **Video Processing**: Analyze videos frame-by-frame for temporal pupil diameter analysis
+- **Model Selection**: Choose between ResNet18 and ResNet50 architectures
+- **Pupil Selection**: Analyze left pupil, right pupil, or both
+- **Blink Detection**: Automatically detect and handle blinks in the analysis
+- **CAM Visualization**: See which parts of the eye the model focuses on for predictions
+- **API Access**: Full Gradio API support for programmatic access
+## Usage
+### Web Interface
+Simply upload an image or video file and configure your analysis parameters:
+- Select pupil(s) to analyze (left, right, or both)
+- Choose the model architecture (ResNet18 or ResNet50)
+- Enable/disable blink detection
+- Click process to get results
+### API Access
+The Gradio interface provides automatic API endpoints. You can access the API documentation at `/docs` when the app is running.
+Example API usage:
+```python
+import requests
+import json
+# For image processing
+files = {"image_input": open("your_image.jpg", "rb")}
+data = {
+    "pupil_selection": "both",
+    "tv_model": "ResNet18",
+    "blink_detection": True
+}
+response = requests.post("https://your-space-url/api/predict", files=files, data=data)
+```
+## Model Information
+The application uses pre-trained ResNet models specifically trained for pupil diameter estimation:
+- **ResNet18**: Faster inference, good accuracy
+- **ResNet50**: Higher accuracy, slower inference
+Both models support:
+- Input resolution: 32x64 pixels (eye region)
+- Output: Pupil diameter in millimeters
+- CAM visualization for model interpretability
+## Technical Details
+- **Face Detection**: MediaPipe for robust face and eye detection
+- **Preprocessing**: Automatic eye region extraction and normalization
+- **Deep Learning**: PyTorch-based ResNet models
+- **Visualization**: Matplotlib for result plotting and CAM overlays
+- **Video Support**: Frame-by-frame analysis with temporal plotting
+## Installation & Setup
+### Local Development
+1. **Clone the repository**
+```bash
+git clone <repository-url>
+cd pupilsense
+```
+2. **Create virtual environment**
+```bash
+python3 -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+```
+3. **Install dependencies**
+```bash
+pip install -r requirements.txt
+```
+4. **Run the application**
+```bash
+python app.py
+```
+The app will be available at `http://localhost:7860`
+### Hugging Face Spaces Deployment
+1. **Create a new Space** on Hugging Face with Gradio SDK
+2. **Upload all files** from the pupilsense directory
+3. **Ensure the following files are present:**
+   - `app.py` (main application file)
+   - `gradio_app.py` (Gradio interface)
+   - `gradio_utils.py` (utility functions)
+   - `requirements.txt` (dependencies)
+   - `README.md` (this file with proper YAML header)
+   - `pre_trained_models/` (model files)
+   - All other supporting files
+## Known Issues & Troubleshooting
+### MediaPipe Issues
+- **Issue**: Segmentation fault or MediaPipe errors in headless environments
+- **Solution**: The app includes error handling for MediaPipe failures. In production environments, ensure proper GPU/display drivers are available.
+### Model Loading
+- **Issue**: Model files not found
+- **Solution**: Ensure `pre_trained_models/` directory contains the required `.pt` files for both ResNet18 and ResNet50 models.
+### Memory Usage
+- **Issue**: High memory usage with large videos
+- **Solution**: The app automatically resizes frames to 640x480 to manage memory usage.
+## File Structure
+```
+pupilsense/
+├── app.py                 # Main application entry point
+├── gradio_app.py         # Gradio interface definition
+├── gradio_utils.py       # Utility functions (MediaPipe-free)
+├── app_utils.py          # Original Streamlit utilities (legacy)
+├── requirements.txt      # Python dependencies
+├── README.md            # This file
+├── config.yml           # Configuration file
+├── registry.py          # Model registry
+├── registry_utils.py    # Registry utilities
+├── utils.py             # General utilities
+├── pre_trained_models/  # Trained model files
+│   ├── ResNet18/
+│   │   ├── left_eye.pt
+│   │   └── right_eye.pt
+│   └── ResNet50/
+│       ├── left_eye.pt
+│       └── right_eye.pt
+├── preprocessing/       # Data preprocessing modules
+├── feature_extraction/  # Feature extraction modules
+├── registrations/       # Model registration modules
+└── sample_videos/       # Sample video files
+```
+## Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Test thoroughly
+5. Submit a pull request
+## License
+See LICENSE file for details.
+---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py ADDED Viewed

	@@ -0,0 +1,40 @@

+import sys
+import os.path as osp
+root_path = osp.abspath(osp.join(__file__, osp.pardir))
+sys.path.append(root_path)
+from gradio_app import create_gradio_interface
+def main():
+    """Main function to launch the Gradio interface."""
+    try:
+        demo = create_gradio_interface()
+        # For Hugging Face Spaces deployment
+        import os
+        if os.getenv("SPACE_ID") or os.getenv("SYSTEM") == "spaces":
+            # Running on Hugging Face Spaces
+            demo.launch(share=True)
+        else:
+            # Running locally
+            try:
+                demo.launch(
+                    server_name="0.0.0.0",
+                    server_port=7860,
+                    share=False
+                )
+            except ValueError as e:
+                if "shareable link must be created" in str(e):
+                    print("Localhost not accessible, creating shareable link...")
+                    demo.launch(share=True)
+                else:
+                    raise e
+    except Exception as e:
+        print(f"Error launching app: {e}")
+        import traceback
+        traceback.print_exc()
+        raise e
+if __name__ == "__main__":
+    main()

app_utils.py ADDED Viewed

	@@ -0,0 +1,906 @@

+import base64
+from io import BytesIO
+import io
+import os
+import sys
+import cv2
+from matplotlib import pyplot as plt
+import numpy as np
+import pandas as pd
+import streamlit as st
+import torch
+import tempfile
+from PIL import Image
+from torchvision.transforms.functional import to_pil_image
+from torchvision import transforms
+from PIL import ImageOps
+import altair as alt
+import streamlit.components.v1 as components
+from torchcam.methods import CAM
+from torchcam import methods as torchcam_methods
+from torchcam.utils import overlay_mask
+import os.path as osp
+root_path = osp.abspath(osp.join(__file__, osp.pardir))
+sys.path.append(root_path)
+from preprocessing.dataset_creation import EyeDentityDatasetCreation
+from utils import get_model
+CAM_METHODS = ["CAM"]
+# colors = ["#2ca02c", "#d62728", "#1f77b4", "#ff7f0e"]  # Green, Red, Blue, Orange
+colors = ["#1f77b4", "#ff7f0e", "#636363"]  # Blue, Orange, Gray
+@torch.no_grad()
+def load_model(model_configs, device="cpu"):
+    """Loads the pre-trained model."""
+    model_path = os.path.join(root_path, model_configs["model_path"])
+    model_dict = torch.load(model_path, map_location=device)
+    model = get_model(model_configs=model_configs)
+    model.load_state_dict(model_dict)
+    model = model.to(device).eval()
+    return model
+def extract_frames(video_path):
+    """Extracts frames from a video file."""
+    vidcap = cv2.VideoCapture(video_path)
+    frames = []
+    success, image = vidcap.read()
+    while success:
+        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+        frames.append(image_rgb)
+        success, image = vidcap.read()
+    vidcap.release()
+    return frames
+def resize_frame(image, max_width=640, max_height=480):
+    if not isinstance(image, Image.Image):
+        image = Image.fromarray(image)
+    original_size = image.size
+    # Resize the frame similarly to the image resizing logic
+    if original_size[0] == original_size[1] and original_size[0] >= 256:
+        max_size = (256, 256)
+    else:
+        max_size = list(original_size)
+        if original_size[0] >= max_width:
+            max_size[0] = max_width
+        elif original_size[0] < 64:
+            max_size[0] = 64
+        if original_size[1] >= max_height:
+            max_size[1] = max_height
+        elif original_size[1] < 32:
+            max_size[1] = 32
+    image.thumbnail(max_size)
+    # image = image.resize(max_size)
+    return image
+def is_image(file_extension):
+    """Checks if the file is an image."""
+    return file_extension.lower() in ["png", "jpeg", "jpg"]
+def is_video(file_extension):
+    """Checks if the file is a video."""
+    return file_extension.lower() in ["mp4", "avi", "mov", "mkv", "webm"]
+def get_codec_and_extension(file_format):
+    """Return codec and file extension based on the format."""
+    if file_format == "mp4":
+        return "H264", ".mp4"
+    elif file_format == "avi":
+        return "MJPG", ".avi"
+    elif file_format == "webm":
+        return "VP80", ".webm"
+    else:
+        return "MJPG", ".avi"
+def display_results(input_image, cam_frame, pupil_diameter, cols):
+    """Displays the input image and overlayed CAM result."""
+    fig, axs = plt.subplots(1, 2, figsize=(10, 5))
+    axs[0].imshow(input_image)
+    axs[0].axis("off")
+    axs[0].set_title("Input Image")
+    axs[1].imshow(cam_frame)
+    axs[1].axis("off")
+    axs[1].set_title("Overlayed CAM")
+    cols[-1].pyplot(fig)
+    cols[-1].text(f"Pupil Diameter: {pupil_diameter:.2f} mm")
+def preprocess_image(input_img, max_size=(256, 256)):
+    """Resizes and preprocesses an image."""
+    input_img.thumbnail(max_size)
+    preprocess_steps = [
+        transforms.ToTensor(),
+        transforms.Resize([32, 64], interpolation=transforms.InterpolationMode.BICUBIC, antialias=True),
+    ]
+    return transforms.Compose(preprocess_steps)(input_img).unsqueeze(0)
+def overlay_text_on_frame(frame, text, position=(16, 20)):
+    """Write text on the image frame using OpenCV."""
+    return cv2.putText(frame, text, position, cv2.FONT_HERSHEY_PLAIN, 1, (255, 255, 255), 1, cv2.LINE_AA)
+def get_configs(blink_detection=False):
+    upscale = "-"
+    upscale_method_or_model = "-"
+    if upscale == "-":
+        sr_configs = None
+    else:
+        sr_configs = {
+            "method": upscale_method_or_model,
+            "params": {"upscale": upscale},
+        }
+    config_file = {
+        "sr_configs": sr_configs,
+        "feature_extraction_configs": {
+            "blink_detection": blink_detection,
+            "upscale": upscale,
+            "extraction_library": "mediapipe",
+        },
+    }
+    return config_file
+def setup(cols, pupil_selection, tv_model, output_path):
+    left_pupil_model = None
+    left_pupil_cam_extractor = None
+    right_pupil_model = None
+    right_pupil_cam_extractor = None
+    output_frames = {}
+    input_frames = {}
+    predicted_diameters = {}
+    pred_diameters_frames = {}
+    if pupil_selection == "both":
+        selected_eyes = ["left_eye", "right_eye"]
+    elif pupil_selection == "left_pupil":
+        selected_eyes = ["left_eye"]
+    elif pupil_selection == "right_pupil":
+        selected_eyes = ["right_eye"]
+    for i, eye_type in enumerate(selected_eyes):
+        model_configs = {
+            "model_path": root_path + f"/pre_trained_models/{tv_model}/{eye_type}.pt",
+            "registered_model_name": tv_model,
+            "num_classes": 1,
+        }
+        if eye_type == "left_eye":
+            left_pupil_model = load_model(model_configs)
+            left_pupil_cam_extractor = None
+            output_frames[eye_type] = []
+            input_frames[eye_type] = []
+            predicted_diameters[eye_type] = []
+            pred_diameters_frames[eye_type] = []
+        else:
+            right_pupil_model = load_model(model_configs)
+            right_pupil_cam_extractor = None
+            output_frames[eye_type] = []
+            input_frames[eye_type] = []
+            predicted_diameters[eye_type] = []
+            pred_diameters_frames[eye_type] = []
+    video_placeholders = {}
+    if output_path:
+        video_cols = cols[1].columns(len(input_frames.keys()))
+        for i, eye_type in enumerate(list(input_frames.keys())):
+            video_placeholders[eye_type] = video_cols[i].empty()
+    return (
+        selected_eyes,
+        input_frames,
+        output_frames,
+        predicted_diameters,
+        pred_diameters_frames,
+        video_placeholders,
+        left_pupil_model,
+        left_pupil_cam_extractor,
+        right_pupil_model,
+        right_pupil_cam_extractor,
+    )
+def process_frames(
+    cols, input_imgs, tv_model, pupil_selection, cam_method, output_path=None, codec=None, blink_detection=False
+):
+    config_file = get_configs(blink_detection)
+    face_frames = []
+    (
+        selected_eyes,
+        input_frames,
+        output_frames,
+        predicted_diameters,
+        pred_diameters_frames,
+        video_placeholders,
+        left_pupil_model,
+        left_pupil_cam_extractor,
+        right_pupil_model,
+        right_pupil_cam_extractor,
+    ) = setup(cols, pupil_selection, tv_model, output_path)
+    ds_creation = EyeDentityDatasetCreation(
+        feature_extraction_configs=config_file["feature_extraction_configs"],
+        sr_configs=config_file["sr_configs"],
+    )
+    preprocess_steps = [
+        transforms.Resize(
+            [32, 64],
+            interpolation=transforms.InterpolationMode.BICUBIC,
+            antialias=True,
+        ),
+        transforms.ToTensor(),
+    ]
+    preprocess_function = transforms.Compose(preprocess_steps)
+    eyes_ratios = []
+    for idx, input_img in enumerate(input_imgs):
+        img = np.array(input_img)
+        ds_results = ds_creation(img)
+        left_eye = None
+        right_eye = None
+        blinked = False
+        eyes_ratio = None
+        if ds_results is not None and "face" in ds_results:
+            face_img = to_pil_image(ds_results["face"])
+            has_face = True
+        else:
+            face_img = to_pil_image(np.zeros((256, 256, 3), dtype=np.uint8))
+            has_face = False
+        face_frames.append({"has_face": has_face, "img": face_img})
+        if ds_results is not None and "eyes" in ds_results.keys():
+            blinked = ds_results["eyes"]["blinked"]
+            eyes_ratio = ds_results["eyes"]["eyes_ratio"]
+            if eyes_ratio is not None:
+                eyes_ratios.append(eyes_ratio)
+            if "left_eye" in ds_results["eyes"].keys() and ds_results["eyes"]["left_eye"] is not None:
+                left_eye = ds_results["eyes"]["left_eye"]
+                left_eye = to_pil_image(left_eye).convert("RGB")
+                left_eye = preprocess_function(left_eye)
+                left_eye = left_eye.unsqueeze(0)
+            if "right_eye" in ds_results["eyes"].keys() and ds_results["eyes"]["right_eye"] is not None:
+                right_eye = ds_results["eyes"]["right_eye"]
+                right_eye = to_pil_image(right_eye).convert("RGB")
+                right_eye = preprocess_function(right_eye)
+                right_eye = right_eye.unsqueeze(0)
+        else:
+            input_img = preprocess_function(input_img)
+            input_img = input_img.unsqueeze(0)
+            if pupil_selection == "left_pupil":
+                left_eye = input_img
+            elif pupil_selection == "right_pupil":
+                right_eye = input_img
+            else:
+                left_eye = input_img
+                right_eye = input_img
+        for i, eye_type in enumerate(selected_eyes):
+            if blinked:
+                if left_eye is not None and eye_type == "left_eye":
+                    _, height, width = left_eye.squeeze(0).shape
+                    input_image_pil = to_pil_image(left_eye.squeeze(0))
+                elif right_eye is not None and eye_type == "right_eye":
+                    _, height, width = right_eye.squeeze(0).shape
+                    input_image_pil = to_pil_image(right_eye.squeeze(0))
+                input_img_np = np.array(input_image_pil)
+                zeros_img = to_pil_image(np.zeros((height, width, 3), dtype=np.uint8))
+                output_img_np = overlay_text_on_frame(np.array(zeros_img), "blink")
+                predicted_diameter = "blink"
+            else:
+                if left_eye is not None and eye_type == "left_eye":
+                    if left_pupil_cam_extractor is None:
+                        if tv_model == "ResNet18":
+                            target_layer = left_pupil_model.resnet.layer4[-1].conv2
+                        elif tv_model == "ResNet50":
+                            target_layer = left_pupil_model.resnet.layer4[-1].conv3
+                        else:
+                            raise Exception(f"No target layer available for selected model: {tv_model}")
+                        left_pupil_cam_extractor = torchcam_methods.__dict__[cam_method](
+                            left_pupil_model,
+                            target_layer=target_layer,
+                            fc_layer=left_pupil_model.resnet.fc,
+                            input_shape=left_eye.shape,
+                        )
+                    output = left_pupil_model(left_eye)
+                    predicted_diameter = output[0].item()
+                    act_maps = left_pupil_cam_extractor(0, output)
+                    activation_map = act_maps[0] if len(act_maps) == 1 else left_pupil_cam_extractor.fuse_cams(act_maps)
+                    input_image_pil = to_pil_image(left_eye.squeeze(0))
+                elif right_eye is not None and eye_type == "right_eye":
+                    if right_pupil_cam_extractor is None:
+                        if tv_model == "ResNet18":
+                            target_layer = right_pupil_model.resnet.layer4[-1].conv2
+                        elif tv_model == "ResNet50":
+                            target_layer = right_pupil_model.resnet.layer4[-1].conv3
+                        else:
+                            raise Exception(f"No target layer available for selected model: {tv_model}")
+                        right_pupil_cam_extractor = torchcam_methods.__dict__[cam_method](
+                            right_pupil_model,
+                            target_layer=target_layer,
+                            fc_layer=right_pupil_model.resnet.fc,
+                            input_shape=right_eye.shape,
+                        )
+                    output = right_pupil_model(right_eye)
+                    predicted_diameter = output[0].item()
+                    act_maps = right_pupil_cam_extractor(0, output)
+                    activation_map = (
+                        act_maps[0] if len(act_maps) == 1 else right_pupil_cam_extractor.fuse_cams(act_maps)
+                    )
+                    input_image_pil = to_pil_image(right_eye.squeeze(0))
+                # Create CAM overlay
+                activation_map_pil = to_pil_image(activation_map, mode="F")
+                result = overlay_mask(input_image_pil, activation_map_pil, alpha=0.5)
+                input_img_np = np.array(input_image_pil)
+                output_img_np = np.array(result)
+            # Add frame and predicted diameter to lists
+            input_frames[eye_type].append(input_img_np)
+            output_frames[eye_type].append(output_img_np)
+            predicted_diameters[eye_type].append(predicted_diameter)
+            if output_path:
+                height, width, _ = output_img_np.shape
+                frame = np.zeros((height, width, 3), dtype=np.uint8)
+                if not isinstance(predicted_diameter, str):
+                    text = f"{predicted_diameter:.2f}"
+                else:
+                    text = predicted_diameter
+                frame = overlay_text_on_frame(frame, text)
+                pred_diameters_frames[eye_type].append(frame)
+                combined_frame = np.vstack((input_img_np, output_img_np, frame))
+                img_base64 = pil_image_to_base64(Image.fromarray(combined_frame))
+                image_html = f'<div style="width: {str(50*len(selected_eyes))}%;"><img src="data:image/png;base64,{img_base64}" style="width: 100%;"></div>'
+                video_placeholders[eye_type].markdown(image_html, unsafe_allow_html=True)
+                # video_placeholders[eye_type].image(combined_frame, use_column_width=True)
+        st.session_state.current_frame = idx + 1
+        txt = f"<p style='font-size:20px;'> Number of Frames Processed: <strong>{st.session_state.current_frame} / {st.session_state.total_frames}</strong> </p>"
+        st.session_state.frame_placeholder.markdown(txt, unsafe_allow_html=True)
+    if output_path:
+        combine_and_show_frames(
+            input_frames, output_frames, pred_diameters_frames, output_path, codec, video_placeholders
+        )
+    return input_frames, output_frames, predicted_diameters, face_frames, eyes_ratios
+# Function to display video with autoplay and loop
+def display_video_with_autoplay(video_col, video_path, width):
+    video_html = f"""
+        <video width="{str(width)}%" height="auto" autoplay loop muted>
+            <source src="data:video/mp4;base64,{video_path}" type="video/mp4">
+        </video>
+    """
+    video_col.markdown(video_html, unsafe_allow_html=True)
+def process_video(cols, video_frames, tv_model, pupil_selection, output_path, cam_method, blink_detection=False):
+    resized_frames = []
+    for i, frame in enumerate(video_frames):
+        input_img = resize_frame(frame, max_width=640, max_height=480)
+        resized_frames.append(input_img)
+    file_format = output_path.split(".")[-1]
+    codec, extension = get_codec_and_extension(file_format)
+    input_frames, output_frames, predicted_diameters, face_frames, eyes_ratios = process_frames(
+        cols, resized_frames, tv_model, pupil_selection, cam_method, output_path, codec, blink_detection
+    )
+    return input_frames, output_frames, predicted_diameters, face_frames, eyes_ratios
+# Function to convert string values to float or None
+def convert_diameter(value):
+    try:
+        return float(value)
+    except (ValueError, TypeError):
+        return None  # Return None if conversion fails
+def combine_and_show_frames(input_frames, cam_frames, pred_diameters_frames, output_path, codec, video_cols):
+    # Assuming all frames have the same keys (eye types)
+    eye_types = input_frames.keys()
+    for i, eye_type in enumerate(eye_types):
+        in_frames = input_frames[eye_type]
+        cam_out_frames = cam_frames[eye_type]
+        pred_diameters_text_frames = pred_diameters_frames[eye_type]
+        # Get frame properties (assuming all frames have the same dimensions)
+        height, width, _ = in_frames[0].shape
+        fourcc = cv2.VideoWriter_fourcc(*codec)
+        fps = 10.0
+        out = cv2.VideoWriter(output_path, fourcc, fps, (width, height * 3))  # Width is tripled for concatenation
+        # Loop through each set of frames and concatenate them
+        for j in range(len(in_frames)):
+            input_frame = in_frames[j]
+            cam_frame = cam_out_frames[j]
+            pred_frame = pred_diameters_text_frames[j]
+            # Convert frames to BGR if necessary
+            input_frame_bgr = cv2.cvtColor(input_frame, cv2.COLOR_RGB2BGR)
+            cam_frame_bgr = cv2.cvtColor(cam_frame, cv2.COLOR_RGB2BGR)
+            pred_frame_bgr = cv2.cvtColor(pred_frame, cv2.COLOR_RGB2BGR)
+            # Concatenate frames horizontally (input, cam, pred)
+            combined_frame = np.vstack((input_frame_bgr, cam_frame_bgr, pred_frame_bgr))
+            # Write the combined frame to the video
+            out.write(combined_frame)
+        # Release the video writer
+        out.release()
+        # Read the video and encode it in base64 for displaying
+        with open(output_path, "rb") as video_file:
+            video_bytes = video_file.read()
+            video_base64 = base64.b64encode(video_bytes).decode("utf-8")
+        # Display the combined video
+        display_video_with_autoplay(video_cols[eye_type], video_base64, width=len(video_cols) * 50)
+        # Clean up
+        os.remove(output_path)
+def set_input_image_on_ui(uploaded_file, cols):
+    input_img = Image.open(BytesIO(uploaded_file.read())).convert("RGB")
+    # NOTE: images taken with phone camera has an EXIF data field which often rotates images taken with the phone in a tilted position. PIL has a utility function that removes this data and ‘uprights’ the image.
+    input_img = ImageOps.exif_transpose(input_img)
+    input_img = resize_frame(input_img, max_width=640, max_height=480)
+    input_img = resize_frame(input_img, max_width=640, max_height=480)
+    cols[0].image(input_img, use_column_width=True)
+    st.session_state.total_frames = 1
+    return input_img
+def set_input_video_on_ui(uploaded_file, cols):
+    tfile = tempfile.NamedTemporaryFile(delete=False)
+    try:
+        tfile.write(uploaded_file.read())
+    except Exception:
+        tfile.write(uploaded_file)
+    video_path = tfile.name
+    video_frames = extract_frames(video_path)
+    cols[0].video(video_path)
+    st.session_state.total_frames = len(video_frames)
+    return video_frames, video_path
+def set_frames_processed_count_placeholder(cols):
+    st.session_state.current_frame = 0
+    st.session_state.frame_placeholder = cols[0].empty()
+    txt = f"<p style='font-size:20px;'> Number of Frames Processed: <strong>{st.session_state.current_frame} / {st.session_state.total_frames}</strong> </p>"
+    st.session_state.frame_placeholder.markdown(txt, unsafe_allow_html=True)
+def video_to_bytes(video_path):
+    # Open the video file in binary mode and return the bytes
+    with open(video_path, "rb") as video_file:
+        return video_file.read()
+def display_video_library(video_folder="./sample_videos"):
+    # Get all video files from the folder
+    video_files = [f for f in os.listdir(video_folder) if f.endswith(".webm")]
+    # Store the selected video path
+    selected_video_path = None
+    # Calculate number of columns (adjust based on your layout preferences)
+    num_columns = 3  # For a grid of 3 videos per row
+    # Display videos in a grid layout with 'Select' button for each video
+    for i in range(0, len(video_files), num_columns):
+        cols = st.columns(num_columns)
+        for idx, video_file in enumerate(video_files[i : i + num_columns]):
+            with cols[idx]:
+                st.subheader(video_file.split(".")[0])  # Use the file name as the title
+                video_path = os.path.join(video_folder, video_file)
+                st.video(video_path)  # Show the video
+                if st.button(f"Select {video_file.split('.')[0]}", key=video_file, type="primary"):
+                    st.session_state.clear()
+                    st.toast("Scroll Down to see the input and predictions", icon="⏬")
+                    selected_video_path = video_path  # Store the path of the selected video
+    return selected_video_path
+def set_page_info_and_sidebar_info():
+    st.set_page_config(page_title="Pupil Diameter Estimator", layout="wide")
+    st.title("👁️ PupilSense 👁️🕵️‍♂️")
+    # st.markdown("Upload your own images or video **OR** select from our sample library below")
+    st.markdown(
+        "<p style='font-size: 30px;'>"
+        "Upload your own image 🖼️ or video 🎞️ <strong>OR</strong> select from our sample videos 📚"
+        "</p>",
+        unsafe_allow_html=True,
+    )
+    # video_path = display_video_library()
+    show_demo_videos = st.sidebar.checkbox("Show Sample Videos", value=False)
+    if show_demo_videos:
+        video_path = display_video_library()
+    else:
+        video_path = None
+    st.markdown("<hr id='target_element' style='border: 1px solid #6d6d6d; margin: 20px 0;'>", unsafe_allow_html=True)
+    cols = st.columns((1, 1))
+    cols[0].header("Input")
+    cols[-1].header("Prediction")
+    st.markdown("<hr style='border: 1px solid #6d6d6d; margin: 20px 0;'>", unsafe_allow_html=True)
+    LABEL_MAP = ["left_pupil", "right_pupil"]
+    TV_MODELS = ["ResNet18", "ResNet50"]
+    if "uploader_key" not in st.session_state:
+        st.session_state["uploader_key"] = 1
+    st.sidebar.title("Upload Face 👨‍🦱 or Eye 👁️")
+    uploaded_file = st.sidebar.file_uploader(
+        "Upload Image or Video",
+        type=["png", "jpeg", "jpg", "mp4", "avi", "mov", "mkv", "webm"],
+        key=st.session_state["uploader_key"],
+    )
+    if uploaded_file is not None:
+        st.session_state["uploaded_file"] = uploaded_file
+    st.sidebar.title("Setup")
+    pupil_selection = st.sidebar.selectbox(
+        "Pupil Selection", ["both"] + LABEL_MAP, help="Select left or right pupil OR both for diameter estimation"
+    )
+    tv_model = st.sidebar.selectbox("Classification model", TV_MODELS, help="Supported Models")
+    blink_detection = st.sidebar.checkbox("Detect Blinks", value=True)
+    st.markdown("<style>#vg-tooltip-element{z-index: 1000051}</style>", unsafe_allow_html=True)
+    if "uploaded_file" not in st.session_state:
+        st.session_state["uploaded_file"] = None
+    if "og_video_path" not in st.session_state:
+        st.session_state["og_video_path"] = None
+    if uploaded_file is None and video_path is not None:
+        video_bytes = video_to_bytes(video_path)
+        uploaded_file = video_bytes
+        st.session_state["uploaded_file"] = uploaded_file
+        st.session_state["og_video_path"] = video_path
+        st.session_state["uploader_key"] = 0
+    return (
+        cols,
+        st.session_state["og_video_path"],
+        st.session_state["uploaded_file"],
+        pupil_selection,
+        tv_model,
+        blink_detection,
+    )
+def pil_image_to_base64(img):
+    """Convert a PIL Image to a base64 encoded string."""
+    buffered = io.BytesIO()
+    img.save(buffered, format="PNG")
+    img_str = base64.b64encode(buffered.getvalue()).decode()
+    return img_str
+def process_image_and_vizualize_data(cols, input_img, tv_model, pupil_selection, blink_detection):
+    input_frames, output_frames, predicted_diameters, face_frames, eyes_ratios = process_frames(
+        cols,
+        [input_img],
+        tv_model,
+        pupil_selection,
+        cam_method=CAM_METHODS[-1],
+        blink_detection=blink_detection,
+    )
+    # for ff in face_frames:
+    #     if ff["has_face"]:
+    #         cols[1].image(face_frames[0]["img"], use_column_width=True)
+    input_frames_keys = input_frames.keys()
+    video_cols = cols[1].columns(len(input_frames_keys))
+    for i, eye_type in enumerate(input_frames_keys):
+        # Check the pupil_selection and set the width accordingly
+        if pupil_selection == "both":
+            video_cols[i].image(input_frames[eye_type][-1], use_column_width=True)
+        else:
+            img_base64 = pil_image_to_base64(Image.fromarray(input_frames[eye_type][-1]))
+            image_html = f'<div style="width: 50%; margin-bottom: 1.2%;"><img src="data:image/png;base64,{img_base64}" style="width: 100%;"></div>'
+            video_cols[i].markdown(image_html, unsafe_allow_html=True)
+    output_frames_keys = output_frames.keys()
+    fig, axs = plt.subplots(1, len(output_frames_keys), figsize=(10, 5))
+    for i, eye_type in enumerate(output_frames_keys):
+        height, width, c = output_frames[eye_type][0].shape
+        frame = np.zeros((height, width, c), dtype=np.uint8)
+        text = f"{predicted_diameters[eye_type][0]:.2f}"
+        frame = overlay_text_on_frame(frame, text)
+        if pupil_selection == "both":
+            video_cols[i].image(output_frames[eye_type][-1], use_column_width=True)
+            video_cols[i].image(frame, use_column_width=True)
+        else:
+            img_base64 = pil_image_to_base64(Image.fromarray(output_frames[eye_type][-1]))
+            image_html = f'<div style="width: 50%; margin-top: 1.2%; margin-bottom: 1.2%"><img src="data:image/png;base64,{img_base64}" style="width: 100%;"></div>'
+            video_cols[i].markdown(image_html, unsafe_allow_html=True)
+            img_base64 = pil_image_to_base64(Image.fromarray(frame))
+            image_html = f'<div style="width: 50%; margin-top: 1.2%"><img src="data:image/png;base64,{img_base64}" style="width: 100%;"></div>'
+            video_cols[i].markdown(image_html, unsafe_allow_html=True)
+    return None
+def plot_ears(eyes_ratios, eyes_df):
+    eyes_df["EAR"] = eyes_ratios
+    df = pd.DataFrame(eyes_ratios, columns=["EAR"])
+    df["Frame"] = range(1, len(eyes_ratios) + 1)  # Create a frame column starting from 1
+    # Create an Altair chart for eyes_ratios
+    line_chart = (
+        alt.Chart(df)
+        .mark_line(color=colors[-1])  # Set color of the line
+        .encode(
+            x=alt.X("Frame:Q", title="Frame Number"),
+            y=alt.Y("EAR:Q", title="Eyes Aspect Ratio"),
+            tooltip=["Frame", "EAR"],
+        )
+        # .properties(title="Eyes Aspect Ratios (EARs)")
+        # .configure_axis(grid=True)
+    )
+    points_chart = line_chart.mark_point(color=colors[-1], filled=True)
+    # Create a horizontal rule at y=0.22
+    line1 = alt.Chart(pd.DataFrame({"y": [0.22]})).mark_rule(color="red").encode(y="y:Q")
+    line2 = alt.Chart(pd.DataFrame({"y": [0.25]})).mark_rule(color="green").encode(y="y:Q")
+    # Add text annotations for the lines
+    text1 = (
+        alt.Chart(pd.DataFrame({"y": [0.22], "label": ["Definite Blinks (<=0.22)"]}))
+        .mark_text(align="left", dx=100, dy=9, color="red", size=16)
+        .encode(y="y:Q", text="label:N")
+    )
+    text2 = (
+        alt.Chart(pd.DataFrame({"y": [0.25], "label": ["No Blinks (>=0.25)"]}))
+        .mark_text(align="left", dx=-150, dy=-9, color="green", size=16)
+        .encode(y="y:Q", text="label:N")
+    )
+    # Add gray area text for the region between red and green lines
+    gray_area_text = (
+        alt.Chart(pd.DataFrame({"y": [0.235], "label": ["Gray Area"]}))
+        .mark_text(align="left", dx=0, dy=0, color="gray", size=16)
+        .encode(y="y:Q", text="label:N")
+    )
+    # Combine all elements: line chart, points, rules, and text annotations
+    final_chart = (
+        line_chart.properties(title="Eyes Aspect Ratios (EARs)")
+        + points_chart
+        + line1
+        + line2
+        + text1
+        + text2
+        + gray_area_text
+    ).interactive()
+    # Configure axis properties at the chart level
+    final_chart = final_chart.configure_axis(grid=True)
+    # Display the Altair chart
+    # st.subheader("Eyes Aspect Ratios (EARs)")
+    st.altair_chart(final_chart, use_container_width=True)
+    return eyes_df
+def plot_individual_charts(predicted_diameters, cols):
+    # Iterate through categories and assign charts to columns
+    for i, (category, values) in enumerate(predicted_diameters.items()):
+        with cols[i]:  # Directly use the column index
+            # st.subheader(category)  # Add a subheader for the category
+            if "left" in category:
+                selected_color = colors[0]
+            elif "right" in category:
+                selected_color = colors[1]
+            else:
+                selected_color = colors[i]
+            # Convert values to numeric, replacing non-numeric values with None
+            values = [convert_diameter(value) for value in values]
+            if "left" in category:
+                category_name = "Left Pupil Diameter"
+            else:
+                category_name = "Right Pupil Diameter"
+            # Create a DataFrame from the values for Altair
+            df = pd.DataFrame(
+                {
+                    "Frame": range(1, len(values) + 1),
+                    category_name: values,
+                }
+            )
+            # Get the min and max values for y-axis limits, ignoring None
+            min_value = min(filter(lambda x: x is not None, values), default=None)
+            max_value = max(filter(lambda x: x is not None, values), default=None)
+            # Create an Altair chart with y-axis limits
+            line_chart = (
+                alt.Chart(df)
+                .mark_line(color=selected_color)
+                .encode(
+                    x=alt.X("Frame:Q", title="Frame Number"),
+                    y=alt.Y(
+                        f"{category_name}:Q",
+                        title="Diameter",
+                        scale=alt.Scale(domain=[min_value, max_value]),
+                    ),
+                    tooltip=[
+                        "Frame",
+                        alt.Tooltip(f"{category_name}:Q", title="Diameter"),
+                    ],
+                )
+                # .properties(title=f"{category} - Predicted Diameters")
+                # .configure_axis(grid=True)
+            )
+            points_chart = line_chart.mark_point(color=selected_color, filled=True)
+            final_chart = (
+                line_chart.properties(
+                    title=f"{'Left Pupil' if 'left' in category else 'Right Pupil'} - Predicted Diameters"
+                )
+                + points_chart
+            ).interactive()
+            final_chart = final_chart.configure_axis(grid=True)
+            # Display the Altair chart
+            st.altair_chart(final_chart, use_container_width=True)
+    return df
+def plot_combined_charts(predicted_diameters):
+    all_min_values = []
+    all_max_values = []
+    # Create an empty DataFrame to store combined data for plotting
+    combined_df = pd.DataFrame()
+    # Iterate through categories and collect data
+    for category, values in predicted_diameters.items():
+        # Convert values to numeric, replacing non-numeric values with None
+        values = [convert_diameter(value) for value in values]
+        # Get the min and max values for y-axis limits, ignoring None
+        min_value = min(filter(lambda x: x is not None, values), default=None)
+        max_value = max(filter(lambda x: x is not None, values), default=None)
+        all_min_values.append(min_value)
+        all_max_values.append(max_value)
+        category = "left_pupil" if "left" in category else "right_pupil"
+        # Create a DataFrame from the values
+        df = pd.DataFrame(
+            {
+                "Diameter": values,
+                "Frame": range(1, len(values) + 1),  # Create a frame column starting from 1
+                "Category": category,  # Add a column to specify the category
+            }
+        )
+        # Append to combined DataFrame
+        combined_df = pd.concat([combined_df, df], ignore_index=True)
+    combined_chart = (
+        alt.Chart(combined_df)
+        .mark_line()
+        .encode(
+            x=alt.X("Frame:Q", title="Frame Number"),
+            y=alt.Y(
+                "Diameter:Q",
+                title="Diameter",
+                scale=alt.Scale(domain=[min(all_min_values), max(all_max_values)]),
+            ),
+            color=alt.Color("Category:N", scale=alt.Scale(range=colors), title="Pupil Type"),
+            tooltip=["Frame", "Diameter:Q", "Category:N"],
+        )
+    )
+    points_chart = combined_chart.mark_point(filled=True)
+    final_chart = (combined_chart.properties(title="Predicted Diameters") + points_chart).interactive()
+    final_chart = final_chart.configure_axis(grid=True)
+    # Display the combined chart
+    st.altair_chart(final_chart, use_container_width=True)
+    # --------------------------------------------
+    # Convert to a DataFrame
+    left_pupil_values = [convert_diameter(value) for value in predicted_diameters["left_eye"]]
+    right_pupil_values = [convert_diameter(value) for value in predicted_diameters["right_eye"]]
+    df = pd.DataFrame(
+        {
+            "Frame": range(1, len(left_pupil_values) + 1),
+            "Left Pupil Diameter": left_pupil_values,
+            "Right Pupil Diameter": right_pupil_values,
+        }
+    )
+    # Calculate the difference between left and right pupil diameters
+    df["Difference Value"] = df["Left Pupil Diameter"] - df["Right Pupil Diameter"]
+    # Determine the status of the difference
+    df["Difference Status"] = df.apply(
+        lambda row: "L>R" if row["Left Pupil Diameter"] > row["Right Pupil Diameter"] else "L<R",
+        axis=1,
+    )
+    return df
+def process_video_and_visualize_data(cols, video_frames, tv_model, pupil_selection, blink_detection, video_path):
+    output_video_path = f"{root_path}/tmp.webm"
+    input_frames, output_frames, predicted_diameters, face_frames, eyes_ratios = process_video(
+        cols,
+        video_frames,
+        tv_model,
+        pupil_selection,
+        output_video_path,
+        cam_method=CAM_METHODS[-1],
+        blink_detection=blink_detection,
+    )
+    os.remove(video_path)
+    num_columns = len(predicted_diameters)
+    cols = st.columns(num_columns)
+    if num_columns == 2:
+        df = plot_combined_charts(predicted_diameters)
+    else:
+        df = plot_individual_charts(predicted_diameters, cols)
+    if eyes_ratios is not None and len(eyes_ratios) > 0:
+        df = plot_ears(eyes_ratios, df)
+    st.dataframe(df, hide_index=True, use_container_width=True)

config.yml ADDED Viewed

	@@ -0,0 +1,51 @@

+seed: 42
+feature_extraction_configs:
+  blink_detection: true
+  upscale: 1
+  extraction_library: "mediapipe"
+  show_features: ['faces', 'eyes', 'blinks']
+model_configs:
+  models_path: "pre_trained_models"
+  registered_model_names: ["ResNet18", "ResNet50"]
+  labels: ["left_eye", "right_eye"]
+  targets: ["left_pupil", "right_pupil"]
+  num_classes: 1
+xai_configs:
+  attribution_methods: [
+    "IntegratedGradients",
+    "Saliency",
+    "InputXGradient",
+    "GuidedBackprop",
+    "Deconvolution",
+    # "GuidedGradCam",
+    # "LayerGradCam",
+    # "LayerGradientXActivation",
+  ]
+  cam_methods: [
+    "CAM",
+    "GradCAM",
+    "GradCAMpp",
+    "SmoothGradCAMpp",
+    "ScoreCAM",
+    "SSCAM",
+    "ISCAM",
+    "XGradCAM",
+    "LayerCAM",
+  ]
+use_sr: false
+upscale_configs:
+  upscale: [1, 2, 3, 4]
+  upscale_method_configs:
+    size: [16, 32]
+    antialias: true
+    interpolation: ["bicubic"]
+sr_methods: ["GFPGAN", "RealESRGAN", "SRResNet", "CodeFormer", "HAT"]
+sr_method_configs:
+  bg_upsampler_name: "realesrgan"
+  prefered_net_in_upsampler: "RRDBNet"

feature_extraction/extractor_mediapipe.py ADDED Viewed

	@@ -0,0 +1,340 @@

+import cv2
+import torch
+import warnings
+import numpy as np
+from PIL import Image
+from math import sqrt
+import mediapipe as mp
+from transformers import pipeline
+warnings.filterwarnings("ignore")
+class ExtractorMediaPipe:
+    def __init__(self, upscale=1):
+        self.upscale = int(upscale)
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        # ========== Face Extraction ==========
+        self.face_detector = mp.solutions.face_detection.FaceDetection(model_selection=0, min_detection_confidence=0.5)
+        self.face_mesh = mp.solutions.face_mesh.FaceMesh(
+            max_num_faces=1,
+            static_image_mode=True,
+            refine_landmarks=True,
+            min_detection_confidence=0.5,
+            min_tracking_confidence=0.5,
+        )
+        # ========== Eyes Extraction ==========
+        self.RIGHT_EYE = [
+            362,
+            382,
+            381,
+            380,
+            374,
+            373,
+            390,
+            249,
+            263,
+            466,
+            388,
+            387,
+            386,
+            385,
+            384,
+            398,
+        ]
+        self.LEFT_EYE = [
+            33,
+            7,
+            163,
+            144,
+            145,
+            153,
+            154,
+            155,
+            133,
+            173,
+            157,
+            158,
+            159,
+            160,
+            161,
+            246,
+        ]
+        # https://huggingface.co/dima806/closed_eyes_image_detection
+        # https://www.kaggle.com/code/dima806/closed-eye-image-detection-vit
+        self.pipe = pipeline(
+            "image-classification",
+            model="dima806/closed_eyes_image_detection",
+            device=self.device,
+        )
+        self.blink_lower_thresh = 0.22
+        self.blink_upper_thresh = 0.25
+        self.blink_confidence = 0.50
+        # ========== Iris Extraction ==========
+        self.RIGHT_IRIS = [474, 475, 476, 477]
+        self.LEFT_IRIS = [469, 470, 471, 472]
+    def extract_face(self, image):
+        tmp_image = image.copy()
+        results = self.face_detector.process(tmp_image)
+        if not results.detections:
+            # print("No face detected")
+            return None
+        else:
+            bboxC = results.detections[0].location_data.relative_bounding_box
+            ih, iw, _ = image.shape
+            # Get bounding box coordinates
+            x, y, w, h = (
+                int(bboxC.xmin * iw),
+                int(bboxC.ymin * ih),
+                int(bboxC.width * iw),
+                int(bboxC.height * ih),
+            )
+            # Calculate the center of the bounding box
+            center_x = x + w // 2
+            center_y = y + h // 2
+            # Calculate new bounds ensuring they fit within the image dimensions
+            half_size = 128 * self.upscale
+            x1 = max(center_x - half_size, 0)
+            y1 = max(center_y - half_size, 0)
+            x2 = min(center_x + half_size, iw)
+            y2 = min(center_y + half_size, ih)
+            # Adjust x1, x2, y1, and y2 to ensure the cropped region is exactly (256 * self.upscale) x (256 * self.upscale)
+            if x2 - x1 < (256 * self.upscale):
+                if x1 == 0:
+                    x2 = min((256 * self.upscale), iw)
+                elif x2 == iw:
+                    x1 = max(iw - (256 * self.upscale), 0)
+            if y2 - y1 < (256 * self.upscale):
+                if y1 == 0:
+                    y2 = min((256 * self.upscale), ih)
+                elif y2 == ih:
+                    y1 = max(ih - (256 * self.upscale), 0)
+            cropped_face = image[y1:y2, x1:x2]
+            # bicubic upsampling
+            # if self.upscale != 1:
+            #     cropped_face = cv2.resize(
+            #         cropped_face,
+            #         (256 * self.upscale, 256 * self.upscale),
+            #         interpolation=cv2.INTER_CUBIC,
+            #     )
+            return cropped_face
+    @staticmethod
+    def landmarksDetection(image, results, draw=False):
+        image_height, image_width = image.shape[:2]
+        mesh_coordinates = [
+            (int(point.x * image_width), int(point.y * image_height))
+            for point in results.multi_face_landmarks[0].landmark
+        ]
+        if draw:
+            [cv2.circle(image, i, 2, (0, 255, 0), -1) for i in mesh_coordinates]
+        return mesh_coordinates
+    @staticmethod
+    def euclideanDistance(point, point1):
+        x, y = point
+        x1, y1 = point1
+        distance = sqrt((x1 - x) ** 2 + (y1 - y) ** 2)
+        return distance
+    def blinkRatio(self, landmarks, right_indices, left_indices):
+        right_eye_landmark1 = landmarks[right_indices[0]]
+        right_eye_landmark2 = landmarks[right_indices[8]]
+        right_eye_landmark3 = landmarks[right_indices[12]]
+        right_eye_landmark4 = landmarks[right_indices[4]]
+        left_eye_landmark1 = landmarks[left_indices[0]]
+        left_eye_landmark2 = landmarks[left_indices[8]]
+        left_eye_landmark3 = landmarks[left_indices[12]]
+        left_eye_landmark4 = landmarks[left_indices[4]]
+        right_eye_horizontal_distance = self.euclideanDistance(right_eye_landmark1, right_eye_landmark2)
+        right_eye_vertical_distance = self.euclideanDistance(right_eye_landmark3, right_eye_landmark4)
+        left_eye_vertical_distance = self.euclideanDistance(left_eye_landmark3, left_eye_landmark4)
+        left_eye_horizontal_distance = self.euclideanDistance(left_eye_landmark1, left_eye_landmark2)
+        right_eye_ratio = right_eye_vertical_distance / right_eye_horizontal_distance
+        left_eye_ratio = left_eye_vertical_distance / left_eye_horizontal_distance
+        eyes_ratio = (right_eye_ratio + left_eye_ratio) / 2
+        return eyes_ratio
+    def extract_eyes_regions(self, image, landmarks, eye_indices):
+        h, w, _ = image.shape
+        points = [(int(landmarks[idx].x * w), int(landmarks[idx].y * h)) for idx in eye_indices]
+        x_min = min([p[0] for p in points])
+        x_max = max([p[0] for p in points])
+        y_min = min([p[1] for p in points])
+        y_max = max([p[1] for p in points])
+        center_x = (x_min + x_max) // 2
+        center_y = (y_min + y_max) // 2
+        target_width = 32 * self.upscale
+        target_height = 16 * self.upscale
+        x1 = max(center_x - target_width // 2, 0)
+        y1 = max(center_y - target_height // 2, 0)
+        x2 = x1 + target_width
+        y2 = y1 + target_height
+        if x2 > w:
+            x1 = w - target_width
+            x2 = w
+        if y2 > h:
+            y1 = h - target_height
+            y2 = h
+        return image[y1:y2, x1:x2]
+    def blink_detection_model(self, left_eye, right_eye):
+        left_eye = cv2.cvtColor(left_eye, cv2.COLOR_RGB2GRAY)
+        left_eye = Image.fromarray(left_eye)
+        preds_left = self.pipe(left_eye)
+        if preds_left[0]["label"] == "closeEye":
+            closed_left = preds_left[0]["score"] >= self.blink_confidence
+        else:
+            closed_left = preds_left[1]["score"] >= self.blink_confidence
+        right_eye = cv2.cvtColor(right_eye, cv2.COLOR_RGB2GRAY)
+        right_eye = Image.fromarray(right_eye)
+        preds_right = self.pipe(right_eye)
+        if preds_right[0]["label"] == "closeEye":
+            closed_right = preds_right[0]["score"] >= self.blink_confidence
+        else:
+            closed_right = preds_right[1]["score"] >= self.blink_confidence
+        # print("preds_left = ", preds_left)
+        # print("preds_right = ", preds_right)
+        return closed_left or closed_right
+    def extract_eyes(self, image, blink_detection=False):
+        tmp_face = image.copy()
+        results = self.face_mesh.process(tmp_face)
+        if results.multi_face_landmarks is None:
+            return None
+        face_landmarks = results.multi_face_landmarks[0].landmark
+        left_eye = self.extract_eyes_regions(image, face_landmarks, self.LEFT_EYE)
+        right_eye = self.extract_eyes_regions(image, face_landmarks, self.RIGHT_EYE)
+        blinked = False
+        eyes_ratio = None
+        if blink_detection:
+            mesh_coordinates = self.landmarksDetection(image, results, False)
+            eyes_ratio = self.blinkRatio(mesh_coordinates, self.RIGHT_EYE, self.LEFT_EYE)
+            if eyes_ratio > self.blink_lower_thresh and eyes_ratio <= self.blink_upper_thresh:
+                # print(
+                #     "I think person blinked. eyes_ratio = ",
+                #     eyes_ratio,
+                #     "Confirming with ViT model...",
+                # )
+                blinked = self.blink_detection_model(left_eye=left_eye, right_eye=right_eye)
+                # if blinked:
+                #     print("Yes, person blinked. Confirmed by model")
+                # else:
+                #     print("No, person didn't blinked. False Alarm")
+            elif eyes_ratio <= self.blink_lower_thresh:
+                blinked = True
+                # print("Surely person blinked. eyes_ratio = ", eyes_ratio)
+            else:
+                blinked = False
+        return {"left_eye": left_eye, "right_eye": right_eye, "blinked": blinked, "eyes_ratio": eyes_ratio}
+    @staticmethod
+    def segment_iris(iris_img):
+        # Convert RGB image to grayscale
+        iris_img_gray = cv2.cvtColor(iris_img, cv2.COLOR_RGB2GRAY)
+        # Apply Gaussian blur for denoising
+        iris_img_blur = cv2.GaussianBlur(iris_img_gray, (5, 5), 0)
+        # Perform adaptive thresholding
+        _, iris_img_mask = cv2.threshold(iris_img_blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
+        # Invert the mask
+        segmented_mask = cv2.bitwise_not(iris_img_mask)
+        segmented_mask = cv2.cvtColor(segmented_mask, cv2.COLOR_GRAY2RGB)
+        segmented_iris = cv2.bitwise_and(iris_img, segmented_mask)
+        return {
+            "segmented_iris": segmented_iris,
+            "segmented_mask": segmented_mask,
+        }
+    def extract_iris(self, image):
+        ih, iw, _ = image.shape
+        tmp_face = image.copy()
+        results = self.face_mesh.process(tmp_face)
+        if results.multi_face_landmarks is None:
+            return None
+        mesh_coordinates = self.landmarksDetection(image, results, False)
+        mesh_points = np.array(mesh_coordinates)
+        (l_cx, l_cy), l_radius = cv2.minEnclosingCircle(mesh_points[self.LEFT_IRIS])
+        (r_cx, r_cy), r_radius = cv2.minEnclosingCircle(mesh_points[self.RIGHT_IRIS])
+        # Crop the left iris to be exactly 16*upscaled x 16*upscaled
+        l_x1 = max(int(l_cx) - (8 * self.upscale), 0)
+        l_y1 = max(int(l_cy) - (8 * self.upscale), 0)
+        l_x2 = min(int(l_cx) + (8 * self.upscale), iw)
+        l_y2 = min(int(l_cy) + (8 * self.upscale), ih)
+        cropped_left_iris = image[l_y1:l_y2, l_x1:l_x2]
+        left_iris_segmented_data = self.segment_iris(cv2.cvtColor(cropped_left_iris, cv2.COLOR_BGR2RGB))
+        # Crop the right iris to be exactly 16*upscaled x 16*upscaled
+        r_x1 = max(int(r_cx) - (8 * self.upscale), 0)
+        r_y1 = max(int(r_cy) - (8 * self.upscale), 0)
+        r_x2 = min(int(r_cx) + (8 * self.upscale), iw)
+        r_y2 = min(int(r_cy) + (8 * self.upscale), ih)
+        cropped_right_iris = image[r_y1:r_y2, r_x1:r_x2]
+        right_iris_segmented_data = self.segment_iris(cv2.cvtColor(cropped_right_iris, cv2.COLOR_BGR2RGB))
+        return {
+            "left_iris": {
+                "img": cropped_left_iris,
+                "segmented_iris": left_iris_segmented_data["segmented_iris"],
+                "segmented_mask": left_iris_segmented_data["segmented_mask"],
+            },
+            "right_iris": {
+                "img": cropped_right_iris,
+                "segmented_iris": right_iris_segmented_data["segmented_iris"],
+                "segmented_mask": right_iris_segmented_data["segmented_mask"],
+            },
+        }

feature_extraction/features_extractor.py ADDED Viewed

	@@ -0,0 +1,48 @@

+import os
+import sys
+import warnings
+import os.path as osp
+ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+root_path = osp.abspath(osp.join(__file__, osp.pardir, osp.pardir))
+sys.path.append(root_path)
+from feature_extraction.extractor_mediapipe import ExtractorMediaPipe
+warnings.filterwarnings("ignore")
+class FeaturesExtractor:
+    def __init__(self, extraction_library="mediapipe", blink_detection=False, upscale=1):
+        self.upscale = upscale
+        self.blink_detection = blink_detection
+        self.extraction_library = extraction_library
+        self.feature_extractor = ExtractorMediaPipe(self.upscale)
+    def __call__(self, image):
+        results = {}
+        face = self.feature_extractor.extract_face(image)
+        if face is None:
+            # print("No face found. Skipped feature extraction!")
+            return None
+        else:
+            results["img"] = image
+            results["face"] = face
+            eyes_data = self.feature_extractor.extract_eyes(image, self.blink_detection)
+            if eyes_data is None:
+                # print("No eyes found. Skipped feature extraction!")
+                return results
+            else:
+                results["eyes"] = eyes_data
+                if eyes_data["blinked"]:
+                    # print("Found blinked eyes!")
+                    return results
+                else:
+                    iris_data = self.feature_extractor.extract_iris(image)
+                    if iris_data is None:
+                        # print("No iris found. Skipped feature extraction!")
+                        return results
+                    else:
+                        results["iris"] = iris_data
+                        return results

gradio_app.py ADDED Viewed

	@@ -0,0 +1,388 @@

+import sys
+import os
+import os.path as osp
+import gradio as gr
+import numpy as np
+import tempfile
+from PIL import Image, ImageOps
+import cv2
+import matplotlib.pyplot as plt
+import io
+import base64
+root_path = osp.abspath(osp.join(__file__, osp.pardir))
+sys.path.append(root_path)
+from registry_utils import import_registered_modules
+from gradio_utils import (
+    is_image,
+    is_video,
+    extract_frames,
+    resize_frame,
+    CAM_METHODS,
+    process_frames_gradio,
+)
+import_registered_modules()
+def process_image_gradio(image, pupil_selection, tv_model, blink_detection):
+    """
+    Process a single image and return results for Gradio interface.
+    Args:
+        image: PIL Image or numpy array
+        pupil_selection: str - "left_pupil", "right_pupil", or "both"
+        tv_model: str - "ResNet18" or "ResNet50"
+        blink_detection: bool - whether to detect blinks
+    Returns:
+        tuple: (input_image, cam_overlay, diameter_text, results_plot)
+    """
+    try:
+        # Convert to PIL Image if needed
+        if isinstance(image, np.ndarray):
+            image = Image.fromarray(image)
+        # Handle EXIF rotation
+        image = ImageOps.exif_transpose(image)
+        # Resize image
+        image = resize_frame(image, max_width=640, max_height=480)
+        # Process the image using Gradio-compatible function
+        input_frames, output_frames, predicted_diameters = process_frames_gradio(
+            input_imgs=[image],
+            tv_model=tv_model,
+            pupil_selection=pupil_selection,
+            blink_detection=blink_detection,
+        )
+        # Check if processing failed (empty results)
+        if not input_frames or not output_frames or not predicted_diameters:
+            error_msg = "Could not detect face/eyes in the image. Please try with a clearer image showing eyes."
+            error_img = Image.new('RGB', (400, 200), 'white')
+            return error_img, error_msg
+        # Create visualization
+        results = []
+        diameter_results = []
+        for eye_type in input_frames.keys():
+            input_img = input_frames[eye_type][-1]
+            output_img = output_frames[eye_type][-1]
+            diameter = predicted_diameters[eye_type][0]
+            # Create side-by-side comparison
+            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
+            ax1.imshow(input_img)
+            ax1.set_title(f"Input - {eye_type.replace('_', ' ').title()}")
+            ax1.axis('off')
+            ax2.imshow(output_img)
+            ax2.set_title(f"CAM Overlay - {eye_type.replace('_', ' ').title()}")
+            ax2.axis('off')
+            plt.tight_layout()
+            # Convert plot to image
+            buf = io.BytesIO()
+            plt.savefig(buf, format='png', dpi=150, bbox_inches='tight')
+            buf.seek(0)
+            plot_img = Image.open(buf)
+            plt.close()
+            results.append(plot_img)
+            # Format diameter result
+            if isinstance(diameter, str):
+                diameter_results.append(f"{eye_type.replace('_', ' ').title()}: {diameter}")
+            else:
+                diameter_results.append(f"{eye_type.replace('_', ' ').title()}: {diameter:.2f} mm")
+        # Combine results if multiple eyes
+        if len(results) == 1:
+            final_image = results[0]
+        else:
+            # Combine multiple eye results
+            total_width = sum(img.width for img in results)
+            max_height = max(img.height for img in results)
+            final_image = Image.new('RGB', (total_width, max_height), 'white')
+            x_offset = 0
+            for img in results:
+                final_image.paste(img, (x_offset, 0))
+                x_offset += img.width
+        diameter_text = "\n".join(diameter_results)
+        return final_image, diameter_text
+    except Exception as e:
+        error_msg = f"Error processing image: {str(e)}"
+        # Create error image
+        error_img = Image.new('RGB', (400, 200), 'white')
+        return error_img, error_msg
+def process_video_gradio(video_file, pupil_selection, tv_model, blink_detection):
+    """
+    Process a video file and return results for Gradio interface.
+    Args:
+        video_file: file path or file object
+        pupil_selection: str - "left_pupil", "right_pupil", or "both"
+        tv_model: str - "ResNet18" or "ResNet50"
+        blink_detection: bool - whether to detect blinks
+    Returns:
+        tuple: (results_plot, diameter_data, summary_text)
+    """
+    try:
+        # Handle video file
+        if hasattr(video_file, 'name'):
+            video_path = video_file.name
+        else:
+            video_path = video_file
+        # Extract frames
+        video_frames = extract_frames(video_path)
+        if not video_frames:
+            return None, "No frames extracted from video", "Error: Could not process video"
+        # Resize frames
+        resized_frames = []
+        for frame in video_frames:
+            if isinstance(frame, np.ndarray):
+                frame = Image.fromarray(frame)
+            input_img = resize_frame(frame, max_width=640, max_height=480)
+            resized_frames.append(input_img)
+        # Process video frames using Gradio-compatible function
+        input_frames, output_frames, predicted_diameters = process_frames_gradio(
+            input_imgs=resized_frames,
+            tv_model=tv_model,
+            pupil_selection=pupil_selection,
+            blink_detection=blink_detection,
+        )
+        # Check if processing failed (empty results)
+        if not input_frames or not output_frames or not predicted_diameters:
+            error_msg = "Could not process video. MediaPipe may have issues in this environment."
+            error_img = Image.new('RGB', (400, 200), 'white')
+            return error_img, "", error_msg
+        # Create results visualization
+        fig, axes = plt.subplots(len(predicted_diameters), 1, figsize=(12, 6 * len(predicted_diameters)))
+        if len(predicted_diameters) == 1:
+            axes = [axes]
+        summary_stats = []
+        for idx, (eye_type, diameters) in enumerate(predicted_diameters.items()):
+            # Filter out non-numeric values (like "blink")
+            numeric_diameters = [d for d in diameters if isinstance(d, (int, float))]
+            frame_numbers = list(range(len(diameters)))
+            # Plot diameter over time
+            axes[idx].plot(frame_numbers, diameters, marker='o', markersize=2)
+            axes[idx].set_title(f"Pupil Diameter Over Time - {eye_type.replace('_', ' ').title()}")
+            axes[idx].set_xlabel("Frame Number")
+            axes[idx].set_ylabel("Diameter (mm)")
+            axes[idx].grid(True, alpha=0.3)
+            # Calculate statistics
+            if numeric_diameters:
+                mean_diameter = np.mean(numeric_diameters)
+                std_diameter = np.std(numeric_diameters)
+                min_diameter = np.min(numeric_diameters)
+                max_diameter = np.max(numeric_diameters)
+                summary_stats.append(f"{eye_type.replace('_', ' ').title()}:")
+                summary_stats.append(f"  Mean: {mean_diameter:.2f} mm")
+                summary_stats.append(f"  Std: {std_diameter:.2f} mm")
+                summary_stats.append(f"  Min: {min_diameter:.2f} mm")
+                summary_stats.append(f"  Max: {max_diameter:.2f} mm")
+                summary_stats.append("")
+        plt.tight_layout()
+        # Convert plot to image
+        buf = io.BytesIO()
+        plt.savefig(buf, format='png', dpi=150, bbox_inches='tight')
+        buf.seek(0)
+        plot_img = Image.open(buf)
+        plt.close()
+        # Create summary text
+        summary_text = f"Processed {len(video_frames)} frames\n\n" + "\n".join(summary_stats)
+        # Create CSV data for download
+        csv_data = "Frame,Eye_Type,Diameter_mm\n"
+        for eye_type, diameters in predicted_diameters.items():
+            for frame_idx, diameter in enumerate(diameters):
+                csv_data += f"{frame_idx},{eye_type},{diameter}\n"
+        # Clean up temporary files if they exist
+        # (output_video_path not used in this implementation)
+        return plot_img, csv_data, summary_text
+    except Exception as e:
+        error_msg = f"Error processing video: {str(e)}"
+        error_img = Image.new('RGB', (400, 200), 'white')
+        return error_img, "", error_msg
+def process_media_unified(media_input, pupil_selection, tv_model, blink_detection):
+    """
+    Unified processing function that handles both images and videos.
+    Args:
+        media_input: Either an image (PIL) or video file path
+        pupil_selection: str - "left_pupil", "right_pupil", or "both"
+        tv_model: str - "ResNet18" or "ResNet50"
+        blink_detection: bool - whether to detect blinks
+    Returns:
+        tuple: (result_image, result_text)
+    """
+    try:
+        # Check if input is an image or video
+        if hasattr(media_input, 'name'):
+            # It's a file object (video)
+            file_path = media_input.name
+            if is_video(file_path):
+                plot_img, csv_data, summary_text = process_video_gradio(media_input, pupil_selection, tv_model, blink_detection)
+                combined_output = f"{summary_text}\n\n--- CSV Data ---\n{csv_data}"
+                return plot_img, combined_output
+            elif is_image(file_path):
+                # Convert file to PIL Image
+                from PIL import Image
+                image = Image.open(file_path)
+                return process_image_gradio(image, pupil_selection, tv_model, blink_detection)
+        else:
+            # It's a PIL Image
+            return process_image_gradio(media_input, pupil_selection, tv_model, blink_detection)
+    except Exception as e:
+        error_msg = f"Error processing media: {str(e)}"
+        from PIL import Image
+        error_img = Image.new('RGB', (400, 200), 'white')
+        return error_img, error_msg
+def create_gradio_interface():
+    """Create and configure the Gradio interface with proper API support."""
+    # Create a unified interface that can handle both images and videos
+    with gr.Blocks(title="👁️ PupilSense 👁️🕵️‍♂️") as demo:
+        gr.Markdown("# 👁️ PupilSense - Pupil Diameter Analysis")
+        gr.Markdown("Upload an image or video to estimate pupil diameter using deep learning models.")
+        with gr.Tab("Image Processing"):
+            with gr.Row():
+                with gr.Column():
+                    image_input = gr.Image(type="pil", label="Upload Image")
+                    image_pupil_selection = gr.Dropdown(
+                        ["left_pupil", "right_pupil", "both"],
+                        value="both",
+                        label="Pupil Selection"
+                    )
+                    image_model = gr.Dropdown(
+                        ["ResNet18", "ResNet50"],
+                        value="ResNet18",
+                        label="Model"
+                    )
+                    image_blink_detection = gr.Checkbox(value=True, label="Detect Blinks")
+                    image_submit = gr.Button("Process Image", variant="primary")
+                with gr.Column():
+                    image_output = gr.Image(label="Results")
+                    image_text_output = gr.Textbox(label="Pupil Diameter Results", lines=5)
+            image_submit.click(
+                fn=process_image_simple,
+                inputs=[image_input, image_pupil_selection, image_model, image_blink_detection],
+                outputs=[image_output, image_text_output]
+            )
+        with gr.Tab("Video Processing"):
+            with gr.Row():
+                with gr.Column():
+                    video_input = gr.Video(label="Upload Video")
+                    video_pupil_selection = gr.Dropdown(
+                        ["left_pupil", "right_pupil", "both"],
+                        value="both",
+                        label="Pupil Selection"
+                    )
+                    video_model = gr.Dropdown(
+                        ["ResNet18", "ResNet50"],
+                        value="ResNet18",
+                        label="Model"
+                    )
+                    video_blink_detection = gr.Checkbox(value=True, label="Detect Blinks")
+                    video_submit = gr.Button("Process Video", variant="primary")
+                with gr.Column():
+                    video_output = gr.Image(label="Diameter Analysis")
+                    video_text_output = gr.Textbox(label="Summary Statistics", lines=10)
+            video_submit.click(
+                fn=process_video_simple,
+                inputs=[video_input, video_pupil_selection, video_model, video_blink_detection],
+                outputs=[video_output, video_text_output]
+            )
+        # Add a unified API endpoint that can handle both images and videos
+        with gr.Tab("API Testing"):
+            gr.Markdown("### API Endpoint for External Access")
+            gr.Markdown("This endpoint can process both images and videos programmatically.")
+            with gr.Row():
+                with gr.Column():
+                    api_media_input = gr.File(label="Upload Image or Video File")
+                    api_pupil_selection = gr.Dropdown(
+                        ["left_pupil", "right_pupil", "both"],
+                        value="both",
+                        label="Pupil Selection"
+                    )
+                    api_model = gr.Dropdown(
+                        ["ResNet18", "ResNet50"],
+                        value="ResNet18",
+                        label="Model"
+                    )
+                    api_blink_detection = gr.Checkbox(value=True, label="Detect Blinks")
+                    api_submit = gr.Button("Process Media", variant="primary")
+                with gr.Column():
+                    api_output = gr.Image(label="Results")
+                    api_text_output = gr.Textbox(label="Analysis Results", lines=10)
+            api_submit.click(
+                fn=process_media_unified,
+                inputs=[api_media_input, api_pupil_selection, api_model, api_blink_detection],
+                outputs=[api_output, api_text_output]
+            )
+    return demo
+def process_image_simple(image, pupil_selection, tv_model, blink_detection):
+    """Simplified image processing function for gr.Interface."""
+    result_image, result_text = process_image_gradio(image, pupil_selection, tv_model, blink_detection)
+    return result_image, result_text
+def process_video_simple(video_file, pupil_selection, tv_model, blink_detection):
+    """Simplified video processing function for gr.Interface."""
+    plot_img, csv_data, summary_text = process_video_gradio(video_file, pupil_selection, tv_model, blink_detection)
+    # Combine summary and CSV data for single output
+    combined_output = f"{summary_text}\n\n--- CSV Data ---\n{csv_data}"
+    return plot_img, combined_output
+if __name__ == "__main__":
+    demo = create_gradio_interface()
+    demo.launch()

gradio_utils.py ADDED Viewed

	@@ -0,0 +1,300 @@

+import base64
+from io import BytesIO
+import io
+import os
+import sys
+import cv2
+from matplotlib import pyplot as plt
+import numpy as np
+import torch
+import tempfile
+from PIL import Image
+from torchvision.transforms.functional import to_pil_image
+from torchvision import transforms
+from PIL import ImageOps
+import os.path as osp
+from torchcam.methods import CAM
+from torchcam import methods as torchcam_methods
+from torchcam.utils import overlay_mask
+root_path = osp.abspath(osp.join(__file__, osp.pardir))
+sys.path.append(root_path)
+from preprocessing.dataset_creation import EyeDentityDatasetCreation
+from utils import get_model
+CAM_METHODS = ["CAM"]
+@torch.no_grad()
+def load_model(model_configs, device="cpu"):
+    """Loads the pre-trained model."""
+    model_path = os.path.join(root_path, model_configs["model_path"])
+    model_dict = torch.load(model_path, map_location=device)
+    model = get_model(model_configs=model_configs)
+    model.load_state_dict(model_dict)
+    model = model.to(device).eval()
+    return model
+def extract_frames(video_path):
+    """Extracts frames from a video file."""
+    vidcap = cv2.VideoCapture(video_path)
+    frames = []
+    success, image = vidcap.read()
+    while success:
+        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+        frames.append(image_rgb)
+        success, image = vidcap.read()
+    vidcap.release()
+    return frames
+def resize_frame(frame, max_width=640, max_height=480):
+    """Resizes a frame while maintaining aspect ratio."""
+    if isinstance(frame, np.ndarray):
+        frame = Image.fromarray(frame)
+    # Calculate the scaling factor
+    width, height = frame.size
+    scale_w = max_width / width
+    scale_h = max_height / height
+    scale = min(scale_w, scale_h)
+    # Resize the frame
+    new_width = int(width * scale)
+    new_height = int(height * scale)
+    return frame.resize((new_width, new_height), Image.Resampling.LANCZOS)
+def is_image(file_extension):
+    """Check if file extension is an image format."""
+    return file_extension.lower() in ["png", "jpg", "jpeg", "bmp", "tiff", "webp"]
+def is_video(file_extension):
+    """Check if file extension is a video format."""
+    return file_extension.lower() in ["mp4", "avi", "mov", "mkv", "webm", "flv", "wmv"]
+def get_configs(blink_detection=False):
+    """Get configuration for feature extraction."""
+    upscale = "-"
+    upscale_method_or_model = "-"
+    if upscale == "-":
+        sr_configs = None
+    else:
+        sr_configs = {
+            "method": upscale_method_or_model,
+            "params": {"upscale": upscale},
+        }
+    config_file = {
+        "sr_configs": sr_configs,
+        "feature_extraction_configs": {
+            "blink_detection": blink_detection,
+            "upscale": upscale,
+            "extraction_library": "mediapipe",
+        },
+    }
+    return config_file
+def setup_gradio(pupil_selection, tv_model):
+    """Setup models and data structures for Gradio processing."""
+    left_pupil_model = None
+    left_pupil_cam_extractor = None
+    right_pupil_model = None
+    right_pupil_cam_extractor = None
+    output_frames = {}
+    input_frames = {}
+    predicted_diameters = {}
+    if pupil_selection == "both":
+        selected_eyes = ["left_eye", "right_eye"]
+    elif pupil_selection == "left_pupil":
+        selected_eyes = ["left_eye"]
+    elif pupil_selection == "right_pupil":
+        selected_eyes = ["right_eye"]
+    for eye_type in selected_eyes:
+        model_configs = {
+            "model_path": root_path + f"/pre_trained_models/{tv_model}/{eye_type}.pt",
+            "registered_model_name": tv_model,
+            "num_classes": 1,
+        }
+        if eye_type == "left_eye":
+            left_pupil_model = load_model(model_configs)
+            left_pupil_cam_extractor = None
+        else:
+            right_pupil_model = load_model(model_configs)
+            right_pupil_cam_extractor = None
+        output_frames[eye_type] = []
+        input_frames[eye_type] = []
+        predicted_diameters[eye_type] = []
+    return (
+        selected_eyes,
+        input_frames,
+        output_frames,
+        predicted_diameters,
+        left_pupil_model,
+        left_pupil_cam_extractor,
+        right_pupil_model,
+        right_pupil_cam_extractor,
+    )
+def process_frames_gradio(input_imgs, tv_model, pupil_selection, blink_detection=False):
+    """
+    Process frames without Streamlit dependencies.
+    """
+    try:
+        config_file = get_configs(blink_detection)
+        (
+            selected_eyes,
+            input_frames,
+            output_frames,
+            predicted_diameters,
+            left_pupil_model,
+            left_pupil_cam_extractor,
+            right_pupil_model,
+            right_pupil_cam_extractor,
+        ) = setup_gradio(pupil_selection, tv_model)
+        ds_creation = EyeDentityDatasetCreation(
+            feature_extraction_configs=config_file["feature_extraction_configs"],
+            sr_configs=config_file["sr_configs"],
+        )
+    except Exception as e:
+        print(f"Error in setup: {e}")
+        # Return empty results if setup fails
+        return {}, {}, {}
+    preprocess_steps = [
+        transforms.Resize(
+            [32, 64],
+            interpolation=transforms.InterpolationMode.BICUBIC,
+            antialias=True,
+        ),
+        transforms.ToTensor(),
+    ]
+    preprocess_function = transforms.Compose(preprocess_steps)
+    for idx, input_img in enumerate(input_imgs):
+        try:
+            img = np.array(input_img)
+            ds_results = ds_creation(img)
+        except Exception as e:
+            print(f"Error in MediaPipe processing for frame {idx}: {e}")
+            ds_results = None
+        left_eye = None
+        right_eye = None
+        blinked = False
+        if ds_results is not None and "face" in ds_results:
+            has_face = True
+        else:
+            has_face = False
+        if has_face and ds_results is not None:
+            if blink_detection and "blinks" in ds_results:
+                blinked = ds_results["blinks"]["blinked"]
+            if not blinked and "eyes" in ds_results:
+                if "left_eye" in ds_results["eyes"] and ds_results["eyes"]["left_eye"] is not None:
+                    left_eye_img = to_pil_image(ds_results["eyes"]["left_eye"])
+                    input_img_tensor = preprocess_function(left_eye_img)
+                    input_img_tensor = input_img_tensor.unsqueeze(0)
+                    if pupil_selection in ["left_pupil", "both"]:
+                        left_eye = input_img_tensor
+                if "right_eye" in ds_results["eyes"] and ds_results["eyes"]["right_eye"] is not None:
+                    right_eye_img = to_pil_image(ds_results["eyes"]["right_eye"])
+                    input_img_tensor = preprocess_function(right_eye_img)
+                    input_img_tensor = input_img_tensor.unsqueeze(0)
+                    if pupil_selection in ["right_pupil", "both"]:
+                        right_eye = input_img_tensor
+        for eye_type in selected_eyes:
+            if blinked:
+                if left_eye is not None and eye_type == "left_eye":
+                    _, height, width = left_eye.squeeze(0).shape
+                    input_image_pil = to_pil_image(left_eye.squeeze(0))
+                elif right_eye is not None and eye_type == "right_eye":
+                    _, height, width = right_eye.squeeze(0).shape
+                    input_image_pil = to_pil_image(right_eye.squeeze(0))
+                else:
+                    # Create a default black image if no eye detected
+                    input_image_pil = Image.new('RGB', (64, 32), 'black')
+                    height, width = 32, 64
+                input_img_np = np.array(input_image_pil)
+                zeros_img = to_pil_image(np.zeros((height, width, 3), dtype=np.uint8))
+                output_img_np = np.array(zeros_img)
+                predicted_diameter = "blink"
+            else:
+                if left_eye is not None and eye_type == "left_eye":
+                    if left_pupil_cam_extractor is None:
+                        if tv_model == "ResNet18":
+                            target_layer = left_pupil_model.resnet.layer4[-1].conv2
+                        elif tv_model == "ResNet50":
+                            target_layer = left_pupil_model.resnet.layer4[-1].conv3
+                        else:
+                            raise Exception(f"No target layer available for selected model: {tv_model}")
+                        left_pupil_cam_extractor = torchcam_methods.__dict__["CAM"](
+                            left_pupil_model,
+                            target_layer=target_layer,
+                            fc_layer=left_pupil_model.resnet.fc,
+                            input_shape=left_eye.shape,
+                        )
+                    output = left_pupil_model(left_eye)
+                    predicted_diameter = output[0].item()
+                    act_maps = left_pupil_cam_extractor(0, output)
+                    activation_map = act_maps[0] if len(act_maps) == 1 else left_pupil_cam_extractor.fuse_cams(act_maps)
+                    input_image_pil = to_pil_image(left_eye.squeeze(0))
+                elif right_eye is not None and eye_type == "right_eye":
+                    if right_pupil_cam_extractor is None:
+                        if tv_model == "ResNet18":
+                            target_layer = right_pupil_model.resnet.layer4[-1].conv2
+                        elif tv_model == "ResNet50":
+                            target_layer = right_pupil_model.resnet.layer4[-1].conv3
+                        else:
+                            raise Exception(f"No target layer available for selected model: {tv_model}")
+                        right_pupil_cam_extractor = torchcam_methods.__dict__["CAM"](
+                            right_pupil_model,
+                            target_layer=target_layer,
+                            fc_layer=right_pupil_model.resnet.fc,
+                            input_shape=right_eye.shape,
+                        )
+                    output = right_pupil_model(right_eye)
+                    predicted_diameter = output[0].item()
+                    act_maps = right_pupil_cam_extractor(0, output)
+                    activation_map = (
+                        act_maps[0] if len(act_maps) == 1 else right_pupil_cam_extractor.fuse_cams(act_maps)
+                    )
+                    input_image_pil = to_pil_image(right_eye.squeeze(0))
+                else:
+                    # No eye detected, create default values
+                    input_image_pil = Image.new('RGB', (64, 32), 'black')
+                    predicted_diameter = "no_eye_detected"
+                    output_img_np = np.array(input_image_pil)
+                    input_frames[eye_type].append(np.array(input_image_pil))
+                    output_frames[eye_type].append(output_img_np)
+                    predicted_diameters[eye_type].append(predicted_diameter)
+                    continue
+                # Create CAM overlay
+                activation_map_pil = to_pil_image(activation_map, mode="F")
+                result = overlay_mask(input_image_pil, activation_map_pil, alpha=0.5)
+                input_img_np = np.array(input_image_pil)
+                output_img_np = np.array(result)
+            input_frames[eye_type].append(input_img_np)
+            output_frames[eye_type].append(output_img_np)
+            predicted_diameters[eye_type].append(predicted_diameter)
+    return input_frames, output_frames, predicted_diameters

pre_trained_models/ResNet18/left_eye.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:98fb2c7880165c59ff975e02cd9e614fcf3a5859455f8d85695f57497dd894e6
+size 46843194

pre_trained_models/ResNet18/right_eye.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68e2928f13900580bcb9b7c1a1f6d4bba863cfcfee2def944b49ef0c09337668
+size 46843194

pre_trained_models/ResNet50/left_eye.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5bd4bac728b71dae9e759b86188206a4f38fbc83b9507dd08f2a6abe1568d995
+size 102554624

pre_trained_models/ResNet50/right_eye.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b5179f569ea1886c9ad63ca9d047fdf721a9b59a63313cd9da3f2e3fae25de73
+size 102554624

preprocessing/dataset_creation.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import sys
+import cv2
+import os.path as osp
+root_path = osp.abspath(osp.join(__file__, osp.pardir, osp.pardir))
+sys.path.append(root_path)
+from feature_extraction.features_extractor import FeaturesExtractor
+class EyeDentityDatasetCreation:
+    def __init__(self, feature_extraction_configs, sr_configs=None):
+        self.extraction_library = feature_extraction_configs["extraction_library"]
+        self.upscale = 1
+        self.blink_detection = feature_extraction_configs["blink_detection"]
+        self.features_extractor = FeaturesExtractor(
+            extraction_library=self.extraction_library,
+            blink_detection=self.blink_detection,
+            upscale=self.upscale,
+        )
+    def __call__(self, img):
+        result_dict = self.features_extractor(img)
+        return result_dict

preprocessing/dataset_creation_utils.py ADDED Viewed

	@@ -0,0 +1,14 @@

+import os
+import torch
+import random
+import numpy as np
+def seed_everything(seed=42):
+    random.seed(seed)
+    os.environ["PYTHONHASHSEED"] = str(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed(seed)
+    torch.backends.cudnn.benchmark = True
+    torch.backends.cudnn.deterministic = True

registrations/models.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import sys
+import torch.nn as nn
+import os.path as osp
+from torchvision import models
+import torch.nn.functional as F
+from registry import MODEL_REGISTRY
+root_path = osp.abspath(osp.join(__file__, osp.pardir, osp.pardir))
+sys.path.append(root_path)
+# ============================= ResNets =============================
+@MODEL_REGISTRY.register()
+class ResNet18(nn.Module):
+    def __init__(self, model_args):
+        super(ResNet18, self).__init__()
+        self.num_classes = model_args.get("num_classes", 1)
+        self.resnet = models.resnet18(weights=None)
+        self.regression_head = nn.Linear(1000, self.num_classes)
+    def forward(self, x, masks=None):
+        # Calculate the padding dynamically based on the input size
+        height, width = x.shape[2], x.shape[3]
+        pad_height = max(0, (224 - height) // 2)
+        pad_width = max(0, (224 - width) // 2)
+        # Apply padding
+        x = F.pad(x, (pad_width, pad_width, pad_height, pad_height), mode="constant", value=0)
+        x = self.resnet(x)
+        x = self.regression_head(x)
+        return x
+@MODEL_REGISTRY.register()
+class ResNet50(nn.Module):
+    def __init__(self, model_args):
+        super(ResNet50, self).__init__()
+        self.num_classes = model_args.get("num_classes", 1)
+        self.resnet = models.resnet50(weights=None)
+        self.regression_head = nn.Linear(1000, self.num_classes)
+    def forward(self, x, masks=None):
+        # Calculate the padding dynamically based on the input size
+        height, width = x.shape[2], x.shape[3]
+        pad_height = max(0, (224 - height) // 2)
+        pad_width = max(0, (224 - width) // 2)
+        # Apply padding
+        x = F.pad(x, (pad_width, pad_width, pad_height, pad_height), mode="constant", value=0)
+        x = self.resnet(x)
+        x = self.regression_head(x)
+        return x
+# print("Registered models in MODEL_REGISTRY:", MODEL_REGISTRY.keys())

registry.py ADDED Viewed

	@@ -0,0 +1,82 @@

+# Modified from: https://github.com/facebookresearch/fvcore/blob/master/fvcore/common/registry.py  # noqa: E501
+class Registry:
+    """
+    The registry that provides name -> object mapping, to support third-party
+    users' custom modules.
+    To create a registry (e.g. a backbone registry):
+    .. code-block:: python
+        BACKBONE_REGISTRY = Registry('BACKBONE')
+    To register an object:
+    .. code-block:: python
+        @BACKBONE_REGISTRY.register()
+        class MyBackbone():
+            ...
+    Or:
+    .. code-block:: python
+        BACKBONE_REGISTRY.register(MyBackbone)
+    """
+    def __init__(self, name):
+        """
+        Args:
+            name (str): the name of this registry
+        """
+        self._name = name
+        self._obj_map = {}
+    def _do_register(self, name, obj):
+        assert name not in self._obj_map, (
+            f"An object named '{name}' was already registered "
+            f"in '{self._name}' registry!"
+        )
+        self._obj_map[name] = obj
+    def register(self, obj=None):
+        """
+        Register the given object under the the name `obj.__name__`.
+        Can be used as either a decorator or not.
+        See docstring of this class for usage.
+        """
+        if obj is None:
+            # used as a decorator
+            def deco(func_or_class):
+                name = func_or_class.__name__
+                self._do_register(name, func_or_class)
+                return func_or_class
+            return deco
+        # used as a function call
+        name = obj.__name__
+        self._do_register(name, obj)
+    def get(self, name):
+        ret = self._obj_map.get(name)
+        if ret is None:
+            raise KeyError(
+                f"No object named '{name}' found in '{self._name}' registry!"
+            )
+        return ret
+    def __contains__(self, name):
+        return name in self._obj_map
+    def __iter__(self):
+        return iter(self._obj_map.items())
+    def keys(self):
+        return self._obj_map.keys()
+MODEL_REGISTRY = Registry("model")

registry_utils.py ADDED Viewed

	@@ -0,0 +1,79 @@

+import os
+import importlib
+from os import path as osp
+def scandir(dir_path, suffix=None, recursive=False, full_path=False):
+    """Scan a directory to find the interested files.
+    Args:
+        dir_path (str): Path of the directory.
+        suffix (str | tuple(str), optional): File suffix that we are
+            interested in. Default: None.
+        recursive (bool, optional): If set to True, recursively scan the
+            directory. Default: False.
+        full_path (bool, optional): If set to True, include the dir_path.
+            Default: False.
+    Returns:
+        A generator for all the interested files with relative paths.
+    """
+    if (suffix is not None) and not isinstance(suffix, (str, tuple)):
+        raise TypeError('"suffix" must be a string or tuple of strings')
+    root = dir_path
+    def _scandir(dir_path, suffix, recursive):
+        for entry in os.scandir(dir_path):
+            if not entry.name.startswith(".") and entry.is_file():
+                if full_path:
+                    return_path = entry.path
+                else:
+                    return_path = osp.relpath(entry.path, root)
+                if suffix is None:
+                    yield return_path
+                elif return_path.endswith(suffix):
+                    yield return_path
+            else:
+                if recursive:
+                    yield from _scandir(entry.path, suffix=suffix, recursive=recursive)
+                else:
+                    continue
+    return _scandir(dir_path, suffix=suffix, recursive=recursive)
+def import_registered_modules(registration_folder="registrations"):
+    """
+    Import all registered modules from the specified folder.
+    This function automatically scans all the files under the specified folder and imports all the required modules for registry.
+    Parameters:
+        registration_folder (str, optional): Path to the folder containing registration modules. Default is "registrations".
+    Returns:
+        list: List of imported modules.
+    """
+    # print("\n")
+    registration_modules_folder = (
+        osp.dirname(osp.abspath(__file__)) + f"/{registration_folder}"
+    )
+    # print("registration_modules_folder = ", registration_modules_folder)
+    registration_modules_file_names = [
+        osp.splitext(osp.basename(v))[0]
+        for v in scandir(dir_path=registration_modules_folder)
+    ]
+    # print("registration_modules_file_names = ", registration_modules_file_names)
+    imported_modules = [
+        importlib.import_module(f"{registration_folder}.{file_name}")
+        for file_name in registration_modules_file_names
+    ]
+    # print("imported_modules = ", imported_modules)
+    # print("\n")

requirements.txt ADDED Viewed

	@@ -0,0 +1,28 @@

+# https://huggingface.co/docs/hub/en/spaces-dependencies
+tqdm
+PyYAML
+numpy
+pandas
+matplotlib
+seaborn
+mlflow
+pillow
+scikit_learn
+torch
+# captum
+evaluate
+# basicsr
+facexlib
+# realesrgan
+opencv_python
+cmake
+# dlib
+einops
+transformers
+# gfpgan
+gradio==4.36.1
+mediapipe
+imutils
+scipy
+torchvision
+torchcam

sample_videos/All Smiles Ahead.webm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bcb016898fac06a517f067ccbe1e6a32366e984aa9b07a7920a8bc9fdd780d17
+size 951586

sample_videos/And it was all Yellow.webm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dfd4d681a4a289aa24fc2193fbcbf855e69c6ce19b9619d8d88b7d782c6047dc
+size 956742

sample_videos/Blink It Like Brian.webm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:590a74b0f13415bb7817c7e28f3f3348bb38aa1f517b8e311f8041c978b4c38b
+size 961928

sample_videos/Focus Pocus.webm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf88c7cff8ef627c03c589ca5feb9c83e95db7e7b55294cbcbafefdfe31cdcf6
+size 971924

sample_videos/Funny Talks.webm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb22539b023294865bc0df2c6c6622ed94d3d5d27df6d0a22ae8fb193c2d6910
+size 963970

sample_videos/I like to move it move it.webm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a4d66abf1707607f826b4b65e20f920d90281e6ab0df5757b57da7216f424b3
+size 958302

sample_videos/Infinite Blue.webm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6047ecda02535212c55714b727c32777a4891be91f642847c586bb24bd3d00d
+size 960117

sample_videos/Red Ross.webm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1078b3fda618c4f5adefe05d389c8360d2a95af1080be7f699a0d72ca454bb3f
+size 960661

sample_videos/Smile, You’re on Camera!.webm ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a8757fa6781889323f9d577862dcc98f05abd295be6de8e8843a3eb1cd406fdd
+size 965710

utils.py ADDED Viewed

	@@ -0,0 +1,11 @@

+from registry import MODEL_REGISTRY
+def get_model(model_configs):
+    registered_model = MODEL_REGISTRY.get(model_configs["registered_model_name"])
+    model_configs.pop("registered_model_name")
+    if len(model_configs) > 0:
+        model = registered_model(model_configs)
+    else:
+        model = registered_model()
+    return model