Spaces:

blanchon
/

Image-GS

Sleeping

Julien Blanchon commited on Sep 15, 2025

Commit

d62394f

0 Parent(s):

Deploy optimized Image-GS with dynamic dependencies

- Pre-built gsplat wheel stored in blanchon/image-gs-models-utils
- Models automatically downloaded from HF models repository
- Dynamic installation of gsplat wheel at runtime
- Optimized Docker build without CUDA compilation
- Clean repository without binary files

Files changed (18) hide show

.dockerignore +46 -0
Dockerfile +62 -0
README.md +231 -0
cfgs/default.yaml +57 -0
gradio_app.py +809 -0
gradio_models.py +827 -0
main.py +57 -0
model.py +824 -0
pyproject.toml +46 -0
utils/__init__.py +0 -0
utils/flip.py +811 -0
utils/image_utils.py +253 -0
utils/misc_utils.py +52 -0
utils/quantization_utils.py +17 -0
utils/saliency/decoder.py +62 -0
utils/saliency/resnet.py +175 -0
utils/saliency_utils.py +38 -0
uv.lock +0 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,46 @@

+# Git
+.git
+.gitignore
+# Python
+__pycache__
+*.pyc
+*.pyo
+*.pyd
+.Python
+*.so
+.pytest_cache
+.coverage
+# Virtual environments
+.venv
+.env
+venv/
+env/
+# IDE
+.vscode
+.idea
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Project specific
+results/
+temp_*
+*.log
+# Documentation
+docs/
+*.md
+!README.md
+# Assets (if large)
+assets/images/
+assets/fonts/
+# GSplat documentation (not needed for runtime)
+gsplat/src/gsplat/cuda/csrc/third_party/glm/doc/

Dockerfile ADDED Viewed

	@@ -0,0 +1,62 @@

+# Use NVIDIA CUDA image that matches PyTorch's CUDA 12.4 compilation
+FROM nvidia/cuda:12.4.0-devel-ubuntu22.04
+# Install Python 3.10 and dependencies with cache mounts
+RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
+    --mount=type=cache,target=/var/lib/apt,sharing=locked \
+    apt-get update && apt-get install -y \
+    python3.10 \
+    python3.10-venv \
+    python3.10-dev \
+    python3-pip \
+    git \
+    build-essential \
+    curl \
+    ninja-build \
+    wget
+# Create symlinks for python
+RUN ln -sf /usr/bin/python3.10 /usr/bin/python3 && \
+    ln -sf /usr/bin/python3.10 /usr/bin/python
+# Set CUDA environment variables for runtime
+ENV CUDA_HOME=/usr/local/cuda \
+    PATH=/usr/local/cuda/bin:$PATH \
+    LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
+# Install uv globally
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
+# Set up user with ID 1000 (required for HF Spaces)
+RUN useradd -m -u 1000 user
+# Switch to user and set working directory
+USER user
+WORKDIR /home/user/app
+# Set environment variables
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH \
+    PYTHONUNBUFFERED=1 \
+    GRADIO_SERVER_NAME=0.0.0.0 \
+    GRADIO_SERVER_PORT=7860 \
+    UV_CACHE_DIR=/home/user/.cache/uv
+# Copy dependency files first for better caching
+COPY --chown=user pyproject.toml uv.lock ./
+# Copy the pre-built wheels directory
+COPY --chown=user wheels/ ./wheels/
+# Install dependencies with uv (using pre-built wheel - much faster!)
+RUN --mount=type=cache,target=/tmp/uv-cache,sharing=locked,uid=1000,gid=1000 \
+    UV_CACHE_DIR=/tmp/uv-cache uv sync --frozen --no-dev
+# Copy the rest of the application
+COPY --chown=user . .
+# Expose port 7860 (default for HF Spaces)
+EXPOSE 7860
+# Launch the Gradio app
+CMD ["uv", "run", "python", "gradio_app.py"]

README.md ADDED Viewed

	@@ -0,0 +1,231 @@

+---
+title: Image GS
+emoji: 💻
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 7860
+pinned: false
+---
+<div align="center">
+<h1>Image-GS: Content-Adaptive Image Representation via 2D Gaussians</h1>
+[**Yunxiang Zhang**](https://yunxiangzhang.github.io/)<sup>1\*</sup>,
+[**Bingxuan Li**](https://bingxuan-li.github.io/)<sup>1\*</sup>,
+[**Alexandr Kuznetsov**](https://alexku.me/)<sup>3&dagger;</sup>,
+[**Akshay Jindal**](https://www.akshayjindal.com/)<sup>2</sup>,
+[**Stavros Diolatzis**](https://www.sdiolatz.info/)<sup>2</sup>,
+[**Kenneth Chen**](https://kenchen10.github.io/)<sup>1</sup>,
+[**Anton Sochenov**](https://www.intel.com/content/www/us/en/developer/articles/community/gpu-researchers-anton-sochenov.html)<sup>2</sup>,
+[**Anton Kaplanyan**](http://kaplanyan.com/)<sup>2</sup>,
+[**Qi Sun**](https://qisun.me/)<sup>1</sup>
+\* Equal contribution &emsp; &dagger; Work done while at Intel
+<sup>1</sup>
+<a href="https://www.immersivecomputinglab.org/research/"><img width="30%" src="assets/images/NYU-logo.png" style="vertical-align: top;" alt="NYU logo"></a>
+&emsp;
+<sup>2</sup>
+<a href="https://www.intel.com/content/www/us/en/developer/topic-technology/graphics-research/overview.html"><img width="22%" src="assets/images/Intel-logo.png" style="vertical-align: top;" alt="Intel logo"></a>
+&emsp;
+<sup>3</sup>
+<a href="https://www.amd.com/en.html"><img width="33%" src="assets/images/AMD-logo.png" style="vertical-align: top;" alt="AMD logo"></a>
+<a href="https://arxiv.org/abs/2407.01866"><img src="https://img.shields.io/badge/arXiv-2407.01866-red" alt="arXiv"></a>
+<a href="https://www.immersivecomputinglab.org/publication/image-gs-content-adaptive-image-representation-via-2d-gaussians/"><img src="https://img.shields.io/badge/project page-ImageGS-blue" alt="project page"></a>
+<a href="https://github.com/NYU-ICL/image-gs"><img src="https://visitor-badge.laobi.icu/badge?page_id=NYU-ICL.image-gs&left_color=green&right_color=red" alt="visitors"></a>
+</div>
+<div style="width: 90%; margin: 0 auto;">
+  Neural image representations have emerged as a promising approach for encoding and rendering visual data. Combined with learning-based workflows, they demonstrate impressive trade-offs between visual fidelity and memory footprint. Existing methods in this domain, however, often rely on fixed data structures that suboptimally allocate memory or compute-intensive implicit models, hindering their practicality for real-time graphics applications.
+  Inspired by recent advancements in radiance field rendering, we introduce Image-GS, a content-adaptive image representation based on 2D Gaussians. Leveraging a custom differentiable renderer, Image-GS reconstructs images by adaptively allocating and progressively optimizing a group of anisotropic, colored 2D Gaussians. It achieves a favorable balance between visual fidelity and memory efficiency across a variety of stylized images frequently seen in graphics workflows, especially for those showing non-uniformly distributed features and in low-bitrate regimes. Moreover, it supports hardware-friendly rapid random access for real-time usage, requiring only 0.3K MACs to decode a pixel. Through error-guided progressive optimization, Image-GS naturally constructs a smooth level-of-detail hierarchy. We demonstrate its versatility with several applications, including texture compression, semantics-aware compression, and joint image compression and restoration.
+  <img src="assets/images/teaser.jpg" width="100%" />
+  <sub>
+  Figure 1: Image-GS reconstructs an image by adaptively allocating and progressively optimizing a set of colored 2D Gaussians. It achieves favorable rate-distortion trade-offs, hardware-friendly random access, and flexible quality control through a smooth level-of-detail stack. (a) visualizes the optimized spatial distribution of Gaussians (20% randomly sampled for clarity). (b) Image-GS’s explicit content-adaptive design effectively captures non-uniformly distributed image features and better preserves fine details under constrained memory budgets. In the inset error maps, brighter colors indicate larger errors.
+  </sub>
+</div>
+## Setup
+1. Create a dedicated Python environment and install the dependencies
+    ```bash
+    git clone https://github.com/NYU-ICL/image-gs.git
+    cd image-gs
+    conda env create -f environment.yml
+    conda activate image-gs
+    pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation
+    cd gsplat
+    pip install -e ".[dev]"
+    cd ..
+    ```
+2. Download the image and texture datasets from [OneDrive](https://1drv.ms/u/c/3a8968df8a027819/EeshjZJlMtdCmvvmESiN2pABM71EDaoLYmEwuOvecg0tAA?e=GybqBv) and organize the folder structure as follows
+    ```
+    image-gs
+    └── media
+        ├── images
+        └── textures
+    ```
+3. (Optional) To run saliency-guided Gaussian position initialization, download the pre-trained [EML-Net](https://github.com/SenJia/EML-NET-Saliency) models ([res_imagenet.pth](https://drive.google.com/open?id=1-a494canr9qWKLdm-DUDMgbGwtlAJz71), [res_places.pth](https://drive.google.com/open?id=18nRz0JSRICLqnLQtAvq01azZAsH0SEzS), [res_decoder.pth](https://drive.google.com/open?id=1vwrkz3eX-AMtXQE08oivGMwS4lKB74sH)) and place them under the `models/emlnet/` folder
+    ```
+    image-gs
+    └── models
+        └── emlnet
+            ├── res_decoder.pth
+            ├── res_imagenet.pth
+            └── res_places.pth
+    ```
+## Quick Start
+#### Image Compression
+- Optimize an Image-GS representation for an input image `anime-1_2k.png` using `10000` Gaussians with half-precision parameters
+```bash
+python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize
+```
+- Render the corresponding optimized Image-GS representation at a new resolution with height `4000` (aspect ratio is maintained)
+```bash
+python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --eval --render_height=4000
+```
+#### Texture Stack Compression
+- Optimize an Image-GS representation for an input texture stack `alarm-clock_2k` using `30000` Gaussians with half-precision parameters
+```bash
+python main.py --input_path="textures/alarm-clock_2k" --exp_name="test/alarm-clock_2k" --num_gaussians=30000 --quantize
+```
+- Render the corresponding optimized Image-GS representation at a new resolution with height `3000` (aspect ratio is maintained)
+```bash
+python main.py --input_path="textures/alarm-clock_2k" --exp_name="test/alarm-clock_2k" --num_gaussians=30000 --quantize  --eval --render_height=3000
+```
+#### Control bit precision of Gaussian parameters
+- Optimize an Image-GS representation for an input image `anime-1_2k.png` using `10000` Gaussians with 12-bit-precision parameters
+```bash
+python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --pos_bits=12 --scale_bits 12 --rot_bits 12 --feat_bits 12
+```
+#### Switch to saliency-guided Gaussian position initialization
+- Optimize an Image-GS representation for an input image `anime-1_2k.png` using `10000` Gaussians with half-precision parameters and saliency-guided initialization
+```bash
+python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --init_mode="saliency"
+```
+## Gradio Web Interface
+We provide a user-friendly web interface built with Gradio for easy experimentation and training visualization.
+### Setup for Web Interface
+1. Install Gradio (in addition to the main dependencies):
+```bash
+pip install gradio>=4.0.0
+```
+2. Launch the web interface:
+```bash
+python gradio_app.py
+```
+3. Open your browser and navigate to `http://localhost:7860`
+### Features
+The Gradio interface provides:
+- **Interactive Parameter Configuration**: Adjust all training parameters through an intuitive UI
+- **Image Upload**: Drag and drop any image to train on
+- **Real-time Training Progress**: Stream training logs and intermediate results
+- **Live Visualization**: Watch Gaussian placement and rendering progress during training
+- **Result Gallery**: View final renders, gradient maps, and saliency maps
+- **Easy Experimentation**: No need to remember command-line arguments
+### Interface Sections
+1. **Configuration Panel**:
+   - Basic parameters (number of Gaussians, training steps)
+   - Quantization settings for memory efficiency
+   - Initialization modes (gradient, saliency, random)
+   - Advanced optimization parameters (learning rates, loss weights)
+2. **Training Progress**:
+   - Real-time streaming logs
+   - Current render and Gaussian visualization updates
+   - Training status and control buttons
+3. **Results Display**:
+   - Final optimized image
+   - Gradient and saliency maps used for initialization
+   - Download capabilities for all results
+### Usage Tips
+- Start with default parameters for your first run
+- Use **saliency initialization** for better results on complex images
+- Enable **Gaussian visualization** to see how the representation evolves
+- Adjust **save image steps** to control visualization frequency (lower = more updates, but slower)
+- For quick tests, reduce **max steps** to 500-1000
+### Command Line Arguments
+Please refer to `cfgs/default.yaml` for the full list of arguments and their default values.
+**Post-optimization rendering**
+- `--eval` render the optimized Image-GS representation.
+- `--render_height` image height for rendering (aspect ratio is maintained).
+**Bit precision control**: 32 bits (float32) per dimension by default
+- `--quantize` enable bit precision control of Gaussian parameters.
+- `--pos_bits` bit precision of individual coordinate dimension.
+- `--scale_bits` bit precision of individual scale dimension.
+- `--rot_bits` bit precision of Gaussian orientation angle.
+- `--feat_bits` bit precision of individual feature dimension.
+**Logging**
+- `--exp_name` path to the logging directory.
+- `--vis_gaussians`: visualize Gaussians during optimization.
+- `--save_image_steps` frequency of rendering intermediate results during optimization.
+- `--save_ckpt_steps` frequency of checkpointing during optimization.
+**Input image**
+- `--input_path` path to an image file or a directory containing a texture stack.
+- `--downsample` load a downsampled version of the input image or texture stack as the optimization target to evaluate image upsampling performance.
+- `--downsample_ratio` downsampling ratio.
+- `--gamma` optimize in a gamma-corrected space, modify with caution.
+**Gaussian**
+- `--num_gaussians` number of Gaussians (for compression rate control).
+- `--init_scale` initial Gaussian scale in number of pixels.
+- `--disable_topk_norm` disable top-K normalization.
+- `--disable_inverse_scale` disable inverse Gaussian scale optimization.
+- `--init_mode` Gaussian position initialization mode, valid values include "gradient", "saliency", and "random".
+- `--init_random_ratio` ratio of Gaussians with randomly initialized position.
+**Optimization**
+- `--disable_tiles` disable tile-based rendering (warning: optimization and rendering without tiles will be way slower).
+- `--max_steps` maximum number of optimization steps.
+- `--pos_lr` Gaussian position learning rate.
+- `--scale_lr` Gaussian scale learning rate.
+- `--rot_lr` Gaussian orientation angle learning rate.
+- `--feat_lr` Gaussian feature learning rate.
+- `--disable_lr_schedule` disable learning rate decay and early stopping schedule.
+- `--disable_prog_optim` disable error-guided progressive optimization.
+## Acknowledgements
+We would like to thank the [gsplat](https://github.com/nerfstudio-project/gsplat) team, and the authors of [3DGS](https://github.com/graphdeco-inria/gaussian-splatting), [fused-ssim](https://github.com/rahul-goel/fused-ssim), and [EML-Net](https://github.com/SenJia/EML-NET-Saliency) for their great work, based on which Image-GS was developed.
+## License
+This project is licensed under the terms of the MIT license.
+## Citation
+If you find this project helpful to your research, please consider citing [BibTeX](assets/docs/image-gs.bib):
+```bibtex
+@inproceedings{zhang2025image,
+  title={Image-gs: Content-adaptive image representation via 2d gaussians},
+  author={Zhang, Yunxiang and Li, Bingxuan and Kuznetsov, Alexandr and Jindal, Akshay and Diolatzis, Stavros and Chen, Kenneth and Sochenov, Anton and Kaplanyan, Anton and Sun, Qi},
+  booktitle={Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
+  pages={1--11},
+  year={2025}
+}
+```

cfgs/default.yaml ADDED Viewed

	@@ -0,0 +1,57 @@

+seed:                  123
+device:                "cuda:0"
+# Evaluation
+eval:                  False # Render the optimized Image-GS representation
+render_height:         2048 # Image height for rendering (aspect ratio is maintained)
+# Bit precision
+quantize:              False # Enable bit precision control of Gaussian parameters
+pos_bits:              16 # Bit precision of individual coordinate dimension
+scale_bits:            16 # Bit precision of individual scale dimension
+rot_bits:              16 # Bit precision of Gaussian orientation angle
+feat_bits:             16 # Bit precision of individual feature dimension
+# Logging
+log_root:              "results"
+exp_name:              "test/anime-1_2k" # Path to the logging directory
+log_level:             "INFO"
+vis_gaussians:         False # Visualize Gaussians during optimization
+save_image_steps:      100000 # Frequency of rendering intermediate results during optimization
+save_ckpt_steps:       100000 # Frequency of checkpointing during optimization
+eval_steps:            100
+# Target images
+gamma:                 1.0 # Optimize in a gamma-corrected space, modify with caution
+data_root:             "media"
+input_path:            "images/anime-1_2k.png" # Path to an image file or a directory containing a texture stack
+downsample:            False # Load a downsampled version of the input image or texture stack as the optimization target to evaluate image upsampling performance
+downsample_ratio:      2.0
+# Gaussians
+num_gaussians:         10000 # Number of Gaussians (for compression rate control)
+init_scale:            5.0 # Initial Gaussian scale in number of pixels
+topk:                  10 # Warning: Must match hardcoded value in CUDA kernel, modify with caution
+disable_topk_norm:     False # Disable top-K normalization
+disable_inverse_scale: False # Disable inverse Gaussian scale optimization
+ckpt_file:             ""
+disable_color_init:    False
+init_mode:             "gradient" # Gaussian position initialization mode, valid values include "gradient", "saliency", and "random"
+init_random_ratio:     0.3 # Ratio of Gaussians with randomly initialized position
+smap_filter_size:      20 # Gaussian filter size for smoothing saliency maps
+# Loss functions
+l1_loss_ratio:         1.0
+l2_loss_ratio:         0.0
+ssim_loss_ratio:       0.1
+# Optimization
+disable_tiles:         False # Disable tile-based rendering (warning: optimization and rendering without tiles will be way slower)
+max_steps:             10000 # Maximum number of optimization steps
+pos_lr:                5.0e-4
+scale_lr:              2.0e-3
+rot_lr:                2.0e-3
+feat_lr:               5.0e-3
+disable_lr_schedule:   False # Disable learning rate schedule and early stopping
+decay_ratio:           10.0
+check_decay_steps:     1000
+max_decay_times:       1
+decay_threshold:       1.0e-3
+disable_prog_optim:    False # Disable error-guided progressive optimization
+initial_ratio:         0.5
+add_steps:             500
+add_times:             4
+post_min_steps:        3000

gradio_app.py ADDED Viewed

	@@ -0,0 +1,809 @@

+import os
+import sys
+import time
+import threading
+import argparse
+import tempfile
+import shutil
+from typing import Generator, Optional, Tuple
+import logging
+try:
+    import gradio as gr
+except ImportError:
+    print("❌ Gradio not found. Please install it with: pip install gradio>=4.0.0")
+    sys.exit(1)
+try:
+    from huggingface_hub import hf_hub_download, snapshot_download
+except ImportError:
+    print(
+        "❌ huggingface_hub not found. Please install it with: pip install huggingface_hub"
+    )
+    sys.exit(1)
+import torch
+from PIL import Image
+# Add the project root to the path so we can import the modules
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+from gradio_models import GradioGaussianSplatting2D, StreamingResults
+from utils.misc_utils import load_cfg
+from main import get_log_dir
+class TrainingState:
+    """Manages the state of training sessions"""
+    def __init__(self):
+        self.is_training = False
+        self.training_thread = None
+        self.model = None
+        self.temp_dir = None
+        self.results = StreamingResults()
+    def reset(self):
+        self.is_training = False
+        if self.temp_dir and os.path.exists(self.temp_dir):
+            shutil.rmtree(self.temp_dir)
+        self.temp_dir = None
+        self.results = StreamingResults()
+# Global training state
+training_state = TrainingState()
+def ensure_models_available():
+    """Download models from HuggingFace if they're not available locally"""
+    models_dir = "models"
+    # Check if models directory exists and has the required files
+    required_files = [
+        "models/emlnet/res_decoder.pth",
+        "models/emlnet/res_imagenet.pth",
+        "models/emlnet/res_places.pth",
+        "models/torch/checkpoints/alexnet-owt-7be5be79.pth",
+    ]
+    # Check if all required files exist
+    all_files_exist = all(os.path.exists(file_path) for file_path in required_files)
+    if not all_files_exist:
+        print("📥 Downloading model files from HuggingFace...")
+        try:
+            # Create models directory if it doesn't exist
+            os.makedirs("models", exist_ok=True)
+            # Download individual model files to ensure they end up in the right place
+            model_files_remote = [
+                "emlnet/res_decoder.pth",
+                "emlnet/res_imagenet.pth",
+                "emlnet/res_places.pth",
+                "torch/checkpoints/alexnet-owt-7be5be79.pth",
+            ]
+            model_files_local = [
+                "models/emlnet/res_decoder.pth",
+                "models/emlnet/res_imagenet.pth",
+                "models/emlnet/res_places.pth",
+                "models/torch/checkpoints/alexnet-owt-7be5be79.pth",
+            ]
+            for remote_file, local_file in zip(model_files_remote, model_files_local):
+                if not os.path.exists(local_file):
+                    # Create directory structure
+                    os.makedirs(os.path.dirname(local_file), exist_ok=True)
+                    # Download the specific file
+                    print(f"📥 Downloading {remote_file} -> {local_file}...")
+                    downloaded_path = hf_hub_download(
+                        repo_id="blanchon/image-gs-models-utils",
+                        filename=remote_file,
+                        repo_type="model",
+                    )
+                    # Copy to the expected local path
+                    shutil.copy2(downloaded_path, local_file)
+            print("✅ Model files downloaded successfully!")
+        except Exception as e:
+            print(f"❌ Failed to download model files: {e}")
+            print("⚠️  The app may not work properly without these model files.")
+    else:
+        print("✅ Model files are already available locally.")
+def create_args_from_config(
+    image_path: str,
+    exp_name: str,
+    num_gaussians: int,
+    quantize: bool,
+    pos_bits: int,
+    scale_bits: int,
+    rot_bits: int,
+    feat_bits: int,
+    init_mode: str,
+    init_random_ratio: float,
+    max_steps: int,
+    vis_gaussians: bool,
+    save_image_steps: int,
+    l1_loss_ratio: float,
+    l2_loss_ratio: float,
+    ssim_loss_ratio: float,
+    pos_lr: float,
+    scale_lr: float,
+    rot_lr: float,
+    feat_lr: float,
+    disable_lr_schedule: bool,
+    disable_prog_optim: bool,
+) -> argparse.Namespace:
+    """Create arguments object from Gradio inputs"""
+    # Load default config
+    parser = argparse.ArgumentParser()
+    parser = load_cfg(cfg_path="cfgs/default.yaml", parser=parser)
+    args = parser.parse_args([])  # Parse empty args to get defaults
+    # Override with user inputs
+    args.input_path = image_path
+    args.exp_name = exp_name
+    args.num_gaussians = num_gaussians
+    args.quantize = quantize
+    args.pos_bits = pos_bits
+    args.scale_bits = scale_bits
+    args.rot_bits = rot_bits
+    args.feat_bits = feat_bits
+    args.init_mode = init_mode
+    args.init_random_ratio = init_random_ratio
+    args.max_steps = max_steps
+    args.vis_gaussians = vis_gaussians
+    args.save_image_steps = save_image_steps
+    args.l1_loss_ratio = l1_loss_ratio
+    args.l2_loss_ratio = l2_loss_ratio
+    args.ssim_loss_ratio = ssim_loss_ratio
+    args.pos_lr = pos_lr
+    args.scale_lr = scale_lr
+    args.rot_lr = rot_lr
+    args.feat_lr = feat_lr
+    args.disable_lr_schedule = disable_lr_schedule
+    args.disable_prog_optim = disable_prog_optim
+    args.eval = False
+    # Set up logging directory
+    args.log_dir = get_log_dir(args)
+    return args
+def train_model(args: argparse.Namespace) -> None:
+    """Training function that runs in a separate thread"""
+    try:
+        # Create and train model with streaming results
+        training_state.model = GradioGaussianSplatting2D(args, training_state.results)
+        # Start training
+        training_state.model.optimize()
+    except Exception as e:
+        training_state.results.training_logs.append(f"ERROR: {str(e)}")
+        logging.error(f"Training failed: {str(e)}")
+    finally:
+        training_state.is_training = False
+def start_training_and_stream(
+    image_file,
+    exp_name: str,
+    num_gaussians: int,
+    quantize: bool,
+    pos_bits: int,
+    scale_bits: int,
+    rot_bits: int,
+    feat_bits: int,
+    init_mode: str,
+    init_random_ratio: float,
+    max_steps: int,
+    vis_gaussians: bool,
+    save_image_steps: int,
+    l1_loss_ratio: float,
+    l2_loss_ratio: float,
+    ssim_loss_ratio: float,
+    pos_lr: float,
+    scale_lr: float,
+    rot_lr: float,
+    feat_lr: float,
+    disable_lr_schedule: bool,
+    disable_prog_optim: bool,
+) -> Generator[
+    Tuple[
+        str,
+        str,
+        Optional[Image.Image],  # initialization_map
+        Optional[Image.Image],  # current_render
+        Optional[Image.Image],  # current_gaussian_id
+        bool,  # start_btn_interactive
+        bool,  # stop_btn_interactive
+    ],
+    None,
+    None,
+]:
+    """Start training and stream progress with images"""
+    if training_state.is_training:
+        yield (
+            "Training is already in progress!",
+            "",
+            None,
+            None,
+            None,
+            False,  # start_btn disabled
+            True,  # stop_btn enabled
+        )
+        return
+    if image_file is None:
+        yield (
+            "Please upload an image first!",
+            "",
+            None,
+            None,
+            None,
+            True,  # start_btn enabled
+            False,  # stop_btn disabled
+        )
+        return
+    try:
+        # Reset training state
+        training_state.reset()
+        # Create temporary directory for the uploaded image
+        training_state.temp_dir = tempfile.mkdtemp()
+        # Save uploaded image
+        image_path = os.path.join(training_state.temp_dir, "input_image.png")
+        image_file.save(image_path)
+        # Create args
+        args = create_args_from_config(
+            image_path=image_path,
+            exp_name=exp_name,
+            num_gaussians=num_gaussians,
+            quantize=quantize,
+            pos_bits=pos_bits,
+            scale_bits=scale_bits,
+            rot_bits=rot_bits,
+            feat_bits=feat_bits,
+            init_mode=init_mode,
+            init_random_ratio=init_random_ratio,
+            max_steps=max_steps,
+            vis_gaussians=vis_gaussians,
+            save_image_steps=save_image_steps,
+            l1_loss_ratio=l1_loss_ratio,
+            l2_loss_ratio=l2_loss_ratio,
+            ssim_loss_ratio=ssim_loss_ratio,
+            pos_lr=pos_lr,
+            scale_lr=scale_lr,
+            rot_lr=rot_lr,
+            feat_lr=feat_lr,
+            disable_lr_schedule=disable_lr_schedule,
+            disable_prog_optim=disable_prog_optim,
+        )
+        # Update data_root to use temp directory
+        args.data_root = training_state.temp_dir
+        args.input_path = "input_image.png"
+        # Start training in separate thread
+        training_state.is_training = True
+        training_state.training_thread = threading.Thread(
+            target=train_model, args=(args,)
+        )
+        training_state.training_thread.start()
+        # Initial yield
+        yield (
+            "Training started! Check the progress below.",
+            "Initializing training...",
+            None,  # initialization_map
+            None,  # current_render
+            None,  # current_gaussian_id
+            False,  # start_btn disabled
+            True,  # stop_btn enabled
+        )
+        # Stream training progress
+        while training_state.is_training or not training_state.results.is_complete:
+            # Check if stop was requested
+            if (
+                not training_state.is_training
+                and training_state.training_thread
+                and training_state.training_thread.is_alive()
+            ):
+                # Force stop the training thread if needed
+                training_state.results.training_logs.append(
+                    "🛑 Training stopped by user request"
+                )
+                break
+            # Get training logs
+            if training_state.results.training_logs:
+                logs_text = "\n".join(training_state.results.training_logs)
+                # Add current metrics if available
+                if training_state.results.step > 0:
+                    # Break if step is greater than total steps
+                    if training_state.results.step > training_state.results.total_steps:
+                        break
+                    metrics = training_state.results.metrics
+                    status_line = (
+                        f"\nCurrent: Step {training_state.results.step}/{training_state.results.total_steps} | "
+                        f"PSNR: {metrics['psnr']:.2f} | SSIM: {metrics['ssim']:.4f} | "
+                        f"Loss: {metrics['loss']:.4f}"
+                    )
+                    logs_text += status_line
+                    # Add image status info for debugging
+                    if training_state.results.current_render is not None:
+                        logs_text += f"\n📸 Current render: {training_state.results.current_render.size}"
+                    else:
+                        logs_text += "\n📸 Current render: None"
+                    if training_state.results.current_gaussian_id is not None:
+                        logs_text += f"\n🆔 Gaussian ID: {training_state.results.current_gaussian_id.size}"
+                    else:
+                        logs_text += "\n🆔 Gaussian ID: None"
+                    logs_text += (
+                        f"\n💾 Stored steps: {len(training_state.results.step_renders)}"
+                    )
+            else:
+                logs_text = "Waiting for training to start..."
+            # Get current images
+            initialization_map = training_state.results.initialization_map
+            current_render = training_state.results.current_render
+            current_gaussian_id = training_state.results.current_gaussian_id
+            # Simple status based on training state
+            current_step = training_state.results.step
+            if training_state.results.is_complete:
+                status = "✅ Training completed successfully!"
+                start_btn_interactive = True
+                stop_btn_interactive = False
+            elif not training_state.is_training:
+                status = "⏹️ Training stopped."
+                start_btn_interactive = True
+                stop_btn_interactive = False
+            else:
+                status = f"🔄 Training in progress... Step {current_step}/{training_state.results.total_steps}"
+                start_btn_interactive = False
+                stop_btn_interactive = True
+            # Always yield, even if images haven't changed
+            yield (
+                status,
+                logs_text,
+                initialization_map,
+                current_render,
+                current_gaussian_id,
+                start_btn_interactive,
+                stop_btn_interactive,
+            )
+            # Stop if training is complete
+            if training_state.results.is_complete or not training_state.is_training:
+                break
+            if current_step > training_state.results.total_steps:
+                break
+            time.sleep(0.5)  # Update more frequently for better responsiveness
+    except Exception as e:
+        training_state.reset()
+        yield (
+            f"Failed to start training: {str(e)}",
+            "",
+            None,
+            None,
+            None,
+            True,  # start_btn enabled
+            False,  # stop_btn disabled
+        )
+def stop_training() -> str:
+    """Stop the current training"""
+    if not training_state.is_training:
+        return "No training in progress."
+    training_state.is_training = False
+    training_state.results.training_logs.append(
+        "🛑 STOP: Training stop requested by user..."
+    )
+    # Set a flag in the model to stop training
+    if training_state.model:
+        training_state.model.stop_requested = True
+    return "Training stop requested. Will complete current step and stop."
+def get_final_results() -> Tuple[Optional[Image.Image], Optional[str]]:
+    """Get final training results"""
+    final_render = training_state.results.final_render
+    checkpoint_path = training_state.results.final_checkpoint_path
+    return final_render, checkpoint_path
+def browse_step_results(
+    step: int,
+) -> Tuple[Optional[Image.Image], Optional[Image.Image]]:
+    """Browse results from a specific training step"""
+    if not training_state.results.is_complete:
+        return None, None
+    # Find the closest available step
+    available_steps = list(training_state.results.step_renders.keys())
+    if not available_steps:
+        return None, None
+    closest_step = min(available_steps, key=lambda x: abs(x - step))
+    render_img = training_state.results.step_renders.get(closest_step)
+    gaussian_id_img = training_state.results.step_gaussian_ids.get(closest_step)
+    return render_img, gaussian_id_img
+def update_step_slider_after_training() -> gr.Slider:
+    """Update step slider range and enable it after training completes"""
+    if not training_state.results.is_complete:
+        return gr.Slider(
+            minimum=0,
+            maximum=10000,
+            value=0,
+            step=100,
+            label="Browse Training Steps",
+            info="Training not complete yet",
+            interactive=False,
+        )
+    available_steps = list(training_state.results.step_renders.keys())
+    if not available_steps:
+        return gr.Slider(
+            minimum=0,
+            maximum=10000,
+            value=0,
+            step=100,
+            label="Browse Training Steps",
+            info="No training steps available",
+            interactive=False,
+        )
+    max_step = max(available_steps)
+    min_step = min(available_steps)
+    # Use the step size from save_image_steps if available, otherwise use difference between steps
+    if len(available_steps) > 1:
+        step_size = available_steps[1] - available_steps[0]
+    else:
+        step_size = 100
+    return gr.Slider(
+        minimum=min_step,
+        maximum=max_step,
+        value=max_step,
+        step=step_size,
+        label="Browse Training Steps",
+        info=f"Browse results from steps {min_step}-{max_step} (interactive)",
+        interactive=True,
+    )
+def create_interface():
+    """Create the Gradio interface"""
+    with gr.Blocks(
+        title="Image-GS: 2D Gaussian Splatting", theme=gr.themes.Soft()
+    ) as demo:
+        gr.Markdown("""
+        # Image-GS: Content-Adaptive Image Representation via 2D Gaussians
+        Upload an image and configure parameters to train a 2D Gaussian Splatting representation.
+        """)
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("## Configuration")
+                # Image upload
+                image_input = gr.Image(
+                    label="Input Image",
+                    type="pil",
+                    height=300,
+                    sources=["upload"],
+                    show_label=True,
+                )
+                # Basic parameters
+                with gr.Group():
+                    gr.Markdown("### Basic Parameters")
+                    exp_name = gr.Textbox(
+                        label="Experiment Name",
+                        value="gradio_experiment",
+                        info="Name for this training run",
+                    )
+                    num_gaussians = gr.Slider(
+                        minimum=1000,
+                        maximum=50000,
+                        value=10000,
+                        step=1000,
+                        label="Number of Gaussians",
+                        info="Number of Gaussians (for compression rate control). More = higher quality but slower training",
+                    )
+                    max_steps = gr.Slider(
+                        minimum=100,
+                        maximum=20000,
+                        value=10000,
+                        step=100,
+                        label="Maximum Training Steps",
+                        info="Maximum number of optimization steps. Default: 10000",
+                    )
+                # Quantization parameters
+                with gr.Group():
+                    gr.Markdown("### Quantization")
+                    quantize = gr.Checkbox(
+                        label="Enable Quantization",
+                        value=False,
+                        info="Enable bit precision control of Gaussian parameters. Reduces memory usage.",
+                    )
+                    with gr.Row():
+                        pos_bits = gr.Slider(
+                            4,
+                            32,
+                            16,
+                            step=1,
+                            label="Position Bits",
+                            info="Bit precision of individual coordinate dimension",
+                        )
+                        scale_bits = gr.Slider(
+                            4,
+                            32,
+                            16,
+                            step=1,
+                            label="Scale Bits",
+                            info="Bit precision of individual scale dimension",
+                        )
+                    with gr.Row():
+                        rot_bits = gr.Slider(
+                            4,
+                            32,
+                            16,
+                            step=1,
+                            label="Rotation Bits",
+                            info="Bit precision of Gaussian orientation angle",
+                        )
+                        feat_bits = gr.Slider(
+                            4,
+                            32,
+                            16,
+                            step=1,
+                            label="Feature Bits",
+                            info="Bit precision of individual feature dimension",
+                        )
+                # Initialization parameters
+                with gr.Group():
+                    gr.Markdown("### Initialization")
+                    init_mode = gr.Radio(
+                        choices=["gradient", "saliency", "random"],
+                        value="saliency",
+                        label="Initialization Mode",
+                        info="Gaussian position initialization mode. Gradient uses image gradients, saliency uses attention maps.",
+                    )
+                    init_random_ratio = gr.Slider(
+                        minimum=0.0,
+                        maximum=1.0,
+                        value=0.3,
+                        step=0.1,
+                        label="Random Ratio",
+                        info="Ratio of Gaussians with randomly initialized position (default: 0.3)",
+                    )
+                # Advanced parameters (collapsible)
+                with gr.Accordion("Advanced Parameters", open=False):
+                    # Loss parameters
+                    gr.Markdown("#### Loss Weights")
+                    with gr.Row():
+                        l1_loss_ratio = gr.Slider(
+                            0.0, 2.0, 1.0, step=0.1, label="L1 Loss"
+                        )
+                        l2_loss_ratio = gr.Slider(
+                            0.0, 2.0, 0.0, step=0.1, label="L2 Loss"
+                        )
+                        ssim_loss_ratio = gr.Slider(
+                            0.0, 1.0, 0.1, step=0.01, label="SSIM Loss"
+                        )
+                    # Learning rates
+                    gr.Markdown("#### Learning Rates")
+                    with gr.Row():
+                        pos_lr = gr.Number(value=5e-4, label="Position LR", precision=6)
+                        scale_lr = gr.Number(value=2e-3, label="Scale LR", precision=6)
+                    with gr.Row():
+                        rot_lr = gr.Number(value=2e-3, label="Rotation LR", precision=6)
+                        feat_lr = gr.Number(value=5e-3, label="Feature LR", precision=6)
+                    # Optimization options
+                    gr.Markdown("#### Optimization")
+                    disable_lr_schedule = gr.Checkbox(
+                        label="Disable LR Schedule",
+                        value=False,
+                        info="Keep learning rate constant",
+                    )
+                    disable_prog_optim = gr.Checkbox(
+                        label="Disable Progressive Optimization",
+                        value=False,
+                        info="Don't add Gaussians during training",
+                    )
+                # Visualization parameters
+                with gr.Group():
+                    gr.Markdown("### Visualization")
+                    vis_gaussians = gr.Checkbox(
+                        label="Visualize Gaussians",
+                        value=True,
+                        info="Visualize Gaussians during optimization (default: True)",
+                    )
+                    save_image_steps = gr.Slider(
+                        minimum=200,
+                        maximum=10000,
+                        value=200,
+                        step=100,
+                        label="Save Image Every N Steps",
+                        info="Frequency of rendering intermediate results during optimization (default: 100)",
+                    )
+                # Control buttons
+                with gr.Row():
+                    start_btn = gr.Button(
+                        "Start Training", variant="primary", size="lg"
+                    )
+                    stop_btn = gr.Button("Stop Training", variant="stop", size="lg")
+                status_text = gr.Textbox(label="Status", interactive=False, lines=2)
+            with gr.Column(scale=2):
+                gr.Markdown("## Training Progress")
+                # Progress logs (streaming)
+                progress_logs = gr.Textbox(
+                    label="Training Logs",
+                    lines=10,
+                    max_lines=15,
+                    interactive=False,
+                    autoscroll=True,
+                )
+                # Initial map (computed at start based on initialization mode)
+                gr.Markdown("### Initialization Map")
+                initialization_map = gr.Image(
+                    label="Initialization Map",
+                    type="pil",
+                    height=200,
+                )
+                # Training images (streaming)
+                gr.Markdown("### Current Training Results")
+                with gr.Row():
+                    current_render = gr.Image(
+                        label="Current Render",
+                        type="pil",
+                        height=300,
+                        show_label=True,
+                        show_download_button=True,
+                    )
+                    current_gaussian_id = gr.Image(
+                        label="Gaussian ID",
+                        type="pil",
+                        height=300,
+                        show_label=True,
+                        show_download_button=True,
+                    )
+                # Step slider for interactive browsing (will be updated dynamically)
+                step_slider = gr.Slider(
+                    minimum=0,
+                    maximum=10000,
+                    value=0,
+                    step=100,
+                    label="Browse Training Steps",
+                    info="Slide to view results from different training steps (disabled during training)",
+                    interactive=False,
+                )
+                gr.Markdown("## Final Results")
+                with gr.Row():
+                    final_render = gr.Image(
+                        label="Final Render", type="pil", height=300
+                    )
+                    final_checkpoint = gr.File(label="Download Final Checkpoint (.pt)")
+                # Results buttons
+                with gr.Row():
+                    results_btn = gr.Button("Load Final Results", size="lg")
+                    enable_slider_btn = gr.Button(
+                        "Enable Step Browsing", size="lg", variant="secondary"
+                    )
+        # Event handlers
+        start_btn.click(
+            fn=start_training_and_stream,
+            inputs=[
+                image_input,
+                exp_name,
+                num_gaussians,
+                quantize,
+                pos_bits,
+                scale_bits,
+                rot_bits,
+                feat_bits,
+                init_mode,
+                init_random_ratio,
+                max_steps,
+                vis_gaussians,
+                save_image_steps,
+                l1_loss_ratio,
+                l2_loss_ratio,
+                ssim_loss_ratio,
+                pos_lr,
+                scale_lr,
+                rot_lr,
+                feat_lr,
+                disable_lr_schedule,
+                disable_prog_optim,
+            ],
+            outputs=[
+                status_text,
+                progress_logs,
+                initialization_map,
+                current_render,
+                current_gaussian_id,
+                start_btn,
+                stop_btn,
+            ],
+        )
+        stop_btn.click(fn=stop_training, outputs=status_text)
+        results_btn.click(
+            fn=get_final_results, outputs=[final_render, final_checkpoint]
+        )
+        enable_slider_btn.click(
+            fn=update_step_slider_after_training, outputs=[step_slider]
+        )
+        step_slider.change(
+            fn=browse_step_results,
+            inputs=[step_slider],
+            outputs=[current_render, current_gaussian_id],
+        )
+    return demo
+if __name__ == "__main__":
+    # Ensure model files are available (download from HF if needed)
+    ensure_models_available()
+    # Set torch hub directory
+    torch.hub.set_dir("models/torch")
+    # Create and launch the interface
+    demo = create_interface()
+    demo.launch(server_name="0.0.0.0", server_port=7860, share=False, debug=True)

gradio_models.py ADDED Viewed

	@@ -0,0 +1,827 @@

+import logging
+import math
+import os
+import threading
+from time import perf_counter
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fused_ssim import fused_ssim
+from torchvision.transforms.functional import gaussian_blur
+from PIL import Image
+from gsplat import (
+    project_gaussians_2d_scale_rot,
+    rasterize_gaussians_no_tiles,
+    rasterize_gaussians_sum,
+)
+from utils.image_utils import (
+    compute_image_gradients,
+    get_grid,
+    get_psnr,
+    load_images,
+    to_output_format,
+)
+from utils.misc_utils import set_random_seed
+from utils.quantization_utils import ste_quantize
+from utils.saliency_utils import get_smap
+class StreamingResults:
+    """Container for streaming training results"""
+    def __init__(self):
+        self.step = 0
+        self.total_steps = 0
+        self.current_render = None
+        self.current_gaussian_id = None
+        self.initialization_map = None  # Single map for current initialization mode
+        self.final_render = None
+        self.final_checkpoint_path = None
+        self.training_logs = []
+        self.metrics = {
+            "psnr": 0.0,
+            "ssim": 0.0,
+            "loss": 0.0,
+            "render_time": 0.0,
+            "total_time": 0.0,
+        }
+        self.is_complete = False
+        # Store all step results for interactive browsing
+        self.step_renders = {}  # {step: PIL_Image}
+        self.step_gaussian_ids = {}  # {step: PIL_Image}
+        # For async visualization generation
+        self.vis_lock = threading.Lock()
+class GradioStreamingHandler(logging.Handler):
+    """Custom logging handler that captures logs for Gradio streaming"""
+    def __init__(self, results_container: StreamingResults):
+        super().__init__()
+        self.results = results_container
+    def emit(self, record):
+        log_entry = self.format(record)
+        self.results.training_logs.append(log_entry)
+        # Keep only last 100 log entries to avoid memory issues
+        if len(self.results.training_logs) > 100:
+            self.results.training_logs = self.results.training_logs[-100:]
+class GradioGaussianSplatting2D(nn.Module):
+    """Gradio-optimized version of GaussianSplatting2D with streaming capabilities"""
+    def __init__(self, args, results_container: StreamingResults):
+        super(GradioGaussianSplatting2D, self).__init__()
+        self.results = results_container
+        self.evaluate = args.eval
+        set_random_seed(seed=args.seed)
+        # Device setup
+        if torch.cuda.is_available():
+            torch.cuda.set_device(0)
+            self.device = torch.device("cuda:0")
+        else:
+            self.device = torch.device("cpu")
+        self.dtype = torch.float32
+        # Initialize components
+        self._init_logging(args)
+        self._init_target(args)
+        self._init_bit_precision(args)
+        self._init_gaussians(args)
+        self._init_loss(args)
+        self._init_optimization(args)
+        # Initialization
+        if self.evaluate:
+            self.ckpt_file = args.ckpt_file
+            self._load_model()
+        else:
+            self._init_pos_scale_feat(args)
+    def _init_logging(self, args):
+        self.log_dir = getattr(args, "log_dir", "temp_gradio_logs")
+        self.vis_gaussians = args.vis_gaussians
+        self.save_image_steps = args.save_image_steps
+        self.eval_steps = args.eval_steps
+        # Set up streaming logger
+        self.worklog = logging.getLogger("GradioImageGS")
+        self.worklog.handlers.clear()  # Remove existing handlers
+        # Add our streaming handler
+        stream_handler = GradioStreamingHandler(self.results)
+        stream_handler.setFormatter(
+            logging.Formatter(fmt="[{asctime}] {message}", style="{")
+        )
+        self.worklog.addHandler(stream_handler)
+        self.worklog.setLevel(logging.INFO)
+        self.worklog.info(
+            f"Start optimizing {args.num_gaussians:d} Gaussians for '{args.input_path}'"
+        )
+    def _init_target(self, args):
+        self.gamma = args.gamma
+        self.downsample = args.downsample
+        if self.downsample:
+            self.downsample_ratio = float(args.downsample_ratio)
+        self.block_h, self.block_w = 16, 16
+        self._load_target_images(path=os.path.join(args.data_root, args.input_path))
+        if self.downsample:
+            self.gt_images_upsampled = self.gt_images
+            self.img_h_upsampled, self.img_w_upsampled = self.img_h, self.img_w
+            self.tile_bounds_upsampled = self.tile_bounds
+            self._load_target_images(
+                path=os.path.join(args.data_root, args.input_path),
+                downsample_ratio=self.downsample_ratio,
+            )
+        self.num_pixels = self.img_h * self.img_w
+    def _load_target_images(self, path, downsample_ratio=None):
+        self.gt_images, self.input_channels, self.image_fnames = load_images(
+            load_path=path, downsample_ratio=downsample_ratio, gamma=self.gamma
+        )
+        self.gt_images = torch.from_numpy(self.gt_images).to(
+            dtype=self.dtype, device=self.device
+        )
+        self.img_h, self.img_w = self.gt_images.shape[1:]
+        self.tile_bounds = (
+            (self.img_w + self.block_w - 1) // self.block_w,
+            (self.img_h + self.block_h - 1) // self.block_h,
+            1,
+        )
+    def _init_bit_precision(self, args):
+        self.quantize = args.quantize
+        self.pos_bits = args.pos_bits
+        self.scale_bits = args.scale_bits
+        self.rot_bits = args.rot_bits
+        self.feat_bits = args.feat_bits
+    def _init_gaussians(self, args):
+        self.num_gaussians = args.num_gaussians
+        self.total_num_gaussians = args.num_gaussians
+        self.disable_prog_optim = args.disable_prog_optim
+        if not self.disable_prog_optim and not self.evaluate:
+            self.initial_ratio = args.initial_ratio
+            self.add_times = args.add_times
+            self.add_steps = args.add_steps
+            self.num_gaussians = math.ceil(
+                self.initial_ratio * self.total_num_gaussians
+            )
+            self.max_add_num = math.ceil(
+                float(self.total_num_gaussians - self.num_gaussians) / self.add_times
+            )
+            min_steps = self.add_steps * self.add_times + args.post_min_steps
+            if args.max_steps < min_steps:
+                self.worklog.info(
+                    f"Max steps ({args.max_steps:d}) is too small for progressive optimization. Resetting to {min_steps:d}"
+                )
+                args.max_steps = min_steps
+        self.topk = args.topk
+        self.eps = 1e-7 if args.disable_tiles else 1e-4
+        self.init_scale = args.init_scale
+        self.disable_topk_norm = args.disable_topk_norm
+        self.disable_inverse_scale = args.disable_inverse_scale
+        self.disable_color_init = args.disable_color_init
+        # Initialize parameters
+        self.xy = nn.Parameter(
+            torch.rand(self.num_gaussians, 2, dtype=self.dtype, device=self.device),
+            requires_grad=True,
+        )
+        self.scale = nn.Parameter(
+            torch.ones(self.num_gaussians, 2, dtype=self.dtype, device=self.device),
+            requires_grad=True,
+        )
+        self.rot = nn.Parameter(
+            torch.zeros(self.num_gaussians, 1, dtype=self.dtype, device=self.device),
+            requires_grad=True,
+        )
+        self.feat_dim = sum(self.input_channels)
+        self.feat = nn.Parameter(
+            torch.rand(
+                self.num_gaussians, self.feat_dim, dtype=self.dtype, device=self.device
+            ),
+            requires_grad=True,
+        )
+        self.vis_feat = nn.Parameter(torch.rand_like(self.feat), requires_grad=False)
+        self._log_compression_rate()
+    def _log_compression_rate(self):
+        bytes_uncompressed = float(self.gt_images.numel())
+        bpp_uncompressed = float(8 * self.feat_dim)
+        self.worklog.info(
+            f"Uncompressed: {bytes_uncompressed / 1e3:.2f} KB | {bpp_uncompressed:.3f} bpp | 8.0 bppc"
+        )
+        bits_compressed = (
+            2 * self.pos_bits
+            + 2 * self.scale_bits
+            + self.rot_bits
+            + self.feat_dim * self.feat_bits
+        ) * self.total_num_gaussians
+        bytes_compressed = bits_compressed / 8.0
+        bpp_compressed = float(bits_compressed) / self.num_pixels
+        bppc_compressed = bpp_compressed / self.feat_dim
+        self.num_bytes = bytes_compressed
+        self.worklog.info(
+            f"Compressed: {bytes_compressed / 1e3:.2f} KB | {bpp_compressed:.3f} bpp | {bppc_compressed:.3f} bppc"
+        )
+        self.worklog.info(
+            f"Compression rate: {bpp_uncompressed / bpp_compressed:.2f}x | {100.0 * bpp_compressed / bpp_uncompressed:.2f}%"
+        )
+    def _init_loss(self, args):
+        self.l1_loss_ratio = args.l1_loss_ratio
+        self.l2_loss_ratio = args.l2_loss_ratio
+        self.ssim_loss_ratio = args.ssim_loss_ratio
+    def _init_optimization(self, args):
+        self.disable_tiles = args.disable_tiles
+        self.start_step = 1
+        self.max_steps = args.max_steps
+        self.results.total_steps = (
+            args.max_steps
+        )  # Set total steps for streaming progress
+        self.pos_lr = args.pos_lr
+        self.scale_lr = args.scale_lr
+        self.rot_lr = args.rot_lr
+        self.feat_lr = args.feat_lr
+        self.optimizer = torch.optim.Adam(
+            [
+                {"params": self.xy, "lr": self.pos_lr},
+                {"params": self.scale, "lr": self.scale_lr},
+                {"params": self.rot, "lr": self.rot_lr},
+                {"params": self.feat, "lr": self.feat_lr},
+            ]
+        )
+        self.disable_lr_schedule = args.disable_lr_schedule
+        if not self.disable_lr_schedule:
+            self.decay_ratio = args.decay_ratio
+            self.check_decay_steps = args.check_decay_steps
+            self.max_decay_times = args.max_decay_times
+            self.decay_threshold = args.decay_threshold
+    def _init_pos_scale_feat(self, args):
+        self.init_mode = args.init_mode
+        self.init_random_ratio = args.init_random_ratio
+        self.pixel_xy = (
+            get_grid(h=self.img_h, w=self.img_w)
+            .to(dtype=self.dtype, device=self.device)
+            .reshape(-1, 2)
+        )
+        with torch.no_grad():
+            # Position initialization
+            if self.init_mode == "gradient":
+                self._compute_gmap()
+                self.xy.copy_(self._sample_pos(prob=self.image_gradients))
+            elif self.init_mode == "saliency":
+                self.smap_filter_size = args.smap_filter_size
+                self._compute_smap()
+                self.xy.copy_(self._sample_pos(prob=self.saliency))
+            else:  # random mode
+                selected = np.random.choice(
+                    self.num_pixels, self.num_gaussians, replace=False, p=None
+                )
+                self.xy.copy_(self.pixel_xy.detach().clone()[selected])
+                # For random mode, create a simple random noise pattern
+                if self.init_mode == "random":
+                    random_pattern = np.random.rand(self.img_h, self.img_w)
+                    self.results.initialization_map = Image.fromarray(
+                        (random_pattern * 255).astype(np.uint8)
+                    )
+            # Scale initialization
+            self.scale.fill_(
+                self.init_scale if self.disable_inverse_scale else 1.0 / self.init_scale
+            )
+            # Feature initialization
+            if not self.disable_color_init:
+                self.feat.copy_(
+                    self._get_target_features(positions=self.xy).detach().clone()
+                )
+    def _sample_pos(self, prob):
+        num_random = round(self.init_random_ratio * self.num_gaussians)
+        selected_random = np.random.choice(
+            self.num_pixels, num_random, replace=False, p=None
+        )
+        selected_other = np.random.choice(
+            self.num_pixels, self.num_gaussians - num_random, replace=False, p=prob
+        )
+        return torch.cat(
+            [
+                self.pixel_xy.detach().clone()[selected_random],
+                self.pixel_xy.detach().clone()[selected_other],
+            ],
+            dim=0,
+        )
+    def _compute_gmap(self):
+        gy, gx = compute_image_gradients(
+            np.power(self.gt_images.detach().cpu().clone().numpy(), 1.0 / self.gamma)
+        )
+        g_norm = np.hypot(gy, gx).astype(np.float32)
+        g_norm = g_norm / g_norm.max()
+        # Store gradient map for streaming (only if this is the selected initialization mode)
+        if self.init_mode == "gradient":
+            self.results.initialization_map = Image.fromarray(
+                (g_norm * 255).astype(np.uint8)
+            )
+        g_norm = np.power(g_norm.reshape(-1), 2.0)
+        self.image_gradients = g_norm / g_norm.sum()
+        self.worklog.info("Image gradient map computed")
+    def _compute_smap(self):
+        smap = get_smap(
+            torch.pow(self.gt_images.detach().clone(), 1.0 / self.gamma),
+            "models",
+            self.smap_filter_size,
+        )
+        # Store saliency map for streaming (only if this is the selected initialization mode)
+        if self.init_mode == "saliency":
+            self.results.initialization_map = Image.fromarray(
+                (smap * 255).astype(np.uint8)
+            )
+        self.saliency = (smap / smap.sum()).reshape(-1)
+        self.worklog.info("Saliency map computed")
+    def _get_target_features(self, positions):
+        with torch.no_grad():
+            target_features = F.grid_sample(
+                self.gt_images.unsqueeze(0),
+                positions[None, None, ...] * 2.0 - 1.0,
+                align_corners=False,
+            )
+            target_features = target_features[0, :, 0, :].permute(1, 0)
+        return target_features
+    def forward(self, img_h, img_w, tile_bounds, upsample_ratio=None, benchmark=False):
+        scale = self._get_scale(upsample_ratio=upsample_ratio)
+        xy, rot, feat = self.xy, self.rot, self.feat
+        if self.quantize:
+            xy, scale, rot, feat = (
+                ste_quantize(xy, self.pos_bits),
+                ste_quantize(scale, self.scale_bits),
+                ste_quantize(rot, self.rot_bits),
+                ste_quantize(feat, self.feat_bits),
+            )
+        begin = perf_counter()
+        tmp = project_gaussians_2d_scale_rot(xy, scale, rot, img_h, img_w, tile_bounds)
+        xy, radii, conics, num_tiles_hit = tmp
+        if not self.disable_tiles:
+            enable_topk_norm = not self.disable_topk_norm
+            tmp = (
+                xy,
+                radii,
+                conics,
+                num_tiles_hit,
+                feat,
+                img_h,
+                img_w,
+                self.block_h,
+                self.block_w,
+                enable_topk_norm,
+            )
+            out_image = rasterize_gaussians_sum(*tmp)
+        else:
+            tmp = xy, conics, feat, img_h, img_w
+            out_image = rasterize_gaussians_no_tiles(*tmp)
+        render_time = perf_counter() - begin
+        if benchmark:
+            return render_time
+        out_image = (
+            out_image.view(-1, img_h, img_w, self.feat_dim)
+            .permute(0, 3, 1, 2)
+            .contiguous()
+        )
+        return out_image.squeeze(dim=0), render_time
+    def _get_scale(self, upsample_ratio=None):
+        scale = self.scale
+        if not self.disable_inverse_scale:
+            scale = 1.0 / scale
+        if upsample_ratio is not None:
+            scale = upsample_ratio * scale
+        return scale
+    def _tensor_to_pil_image(self, tensor_image):
+        """Convert tensor image to PIL Image for streaming"""
+        if tensor_image is None:
+            return None
+        # Convert to numpy and apply gamma correction
+        image_np = (
+            torch.pow(torch.clamp(tensor_image, 0.0, 1.0), 1.0 / self.gamma)
+            .detach()
+            .cpu()
+            .numpy()
+        )
+        # Convert to uint8 format
+        image_formatted = to_output_format(image_np, gamma=None)
+        return Image.fromarray(image_formatted)
+    def _create_gaussian_id_visualization(self):
+        """Create Gaussian ID visualization as PIL Image using rasterization with vis_feat"""
+        if not self.vis_gaussians:
+            return None
+        try:
+            # Use vis_feat for ID visualization (this creates unique colors per Gaussian)
+            feat = self.vis_feat * self.feat.norm(dim=-1, keepdim=True)
+            # Render with ID features
+            scale = self._get_scale()
+            xy, rot = self.xy, self.rot
+            if self.quantize:
+                xy, scale, rot, feat = (
+                    ste_quantize(xy, self.pos_bits),
+                    ste_quantize(scale, self.scale_bits),
+                    ste_quantize(rot, self.rot_bits),
+                    ste_quantize(feat, self.feat_bits),
+                )
+            tmp = project_gaussians_2d_scale_rot(
+                xy, scale, rot, self.img_h, self.img_w, self.tile_bounds
+            )
+            xy, radii, conics, num_tiles_hit = tmp
+            if not self.disable_tiles:
+                enable_topk_norm = not self.disable_topk_norm
+                tmp = (
+                    xy,
+                    radii,
+                    conics,
+                    num_tiles_hit,
+                    feat,
+                    self.img_h,
+                    self.img_w,
+                    self.block_h,
+                    self.block_w,
+                    enable_topk_norm,
+                )
+                out_image = rasterize_gaussians_sum(*tmp)
+            else:
+                tmp = xy, conics, feat, self.img_h, self.img_w
+                out_image = rasterize_gaussians_no_tiles(*tmp)
+            out_image = (
+                out_image.view(-1, self.img_h, self.img_w, self.feat_dim)
+                .permute(0, 3, 1, 2)
+                .contiguous()
+            ).squeeze(dim=0)
+            return self._tensor_to_pil_image(out_image)
+        except Exception as e:
+            self.worklog.error(f"Error creating Gaussian ID visualization: {e}")
+            return None
+    def optimize(self):
+        """Main optimization loop with streaming updates"""
+        self.psnr_curr, self.ssim_curr = 0.0, 0.0
+        self.best_psnr, self.best_ssim = 0.0, 0.0
+        self.decay_times, self.no_improvement_steps = 0, 0
+        self.render_time_accum, self.total_time_accum = 0.0, 0.0
+        # Initialize attributes needed for evaluation
+        self.l1_loss = None
+        self.l2_loss = None
+        self.ssim_loss = None
+        self.stop_requested = False
+        # Initial render and update
+        with torch.no_grad():
+            images, _ = self.forward(self.img_h, self.img_w, self.tile_bounds)
+            self.results.current_render = self._tensor_to_pil_image(images)
+            if self.vis_gaussians:
+                try:
+                    self.results.current_gaussian_id = (
+                        self._create_gaussian_id_visualization()
+                    )
+                    self.worklog.info(
+                        f"Initial visualizations created - Render: {'✓' if self.results.current_render else '✗'}, ID: {'✓' if self.results.current_gaussian_id else '✗'}"
+                    )
+                except Exception as e:
+                    self.worklog.error(f"Error creating initial visualizations: {e}")
+                    self.results.current_gaussian_id = None
+            # Store initial results (step 0)
+            self.results.step_renders[0] = self.results.current_render
+            if self.vis_gaussians:
+                self.results.step_gaussian_ids[0] = self.results.current_gaussian_id
+        for step in range(self.start_step, self.max_steps + 1):
+            self.step = step
+            self.results.step = step
+            self.optimizer.zero_grad()
+            # Forward pass
+            images, render_time = self.forward(self.img_h, self.img_w, self.tile_bounds)
+            self.render_time_accum += render_time
+            # Compute loss
+            begin = perf_counter()
+            self._get_total_loss(images)
+            self.total_loss.backward()
+            self.optimizer.step()
+            self.total_time_accum += perf_counter() - begin + render_time
+            # Update streaming results
+            with torch.no_grad():
+                if step % self.eval_steps == 0:
+                    self._evaluate_and_update_stream(images)
+                # Update render image more frequently, but visualizations less frequently
+                render_update_freq = max(
+                    50, self.save_image_steps // 2
+                )  # Render updates every 50 steps
+                vis_update_freq = max(
+                    200, self.save_image_steps
+                )  # Visualizations every 200 steps
+                if step % render_update_freq == 0:
+                    render_img = self._tensor_to_pil_image(images)
+                    self.results.current_render = render_img
+                # Only update Gaussian ID visualization less frequently
+                if step % vis_update_freq == 0 and self.vis_gaussians:
+                    # Generate Gaussian ID visualization asynchronously
+                    def generate_gaussian_id_async():
+                        try:
+                            with self.results.vis_lock:
+                                gaussian_id_vis = (
+                                    self._create_gaussian_id_visualization()
+                                )
+                                self.results.current_gaussian_id = gaussian_id_vis
+                        except Exception as e:
+                            self.worklog.error(
+                                f"Error creating Gaussian ID visualization at step {step}: {e}"
+                            )
+                            with self.results.vis_lock:
+                                self.results.current_gaussian_id = None
+                    # Start async visualization generation
+                    vis_thread = threading.Thread(target=generate_gaussian_id_async)
+                    vis_thread.daemon = True
+                    vis_thread.start()
+                # Store results for interactive browsing only at save_image_steps intervals
+                if step % self.save_image_steps == 0:
+                    # Store the current render for browsing
+                    if self.results.current_render:
+                        self.results.step_renders[step] = self.results.current_render
+                    # Store Gaussian ID visualization for browsing
+                    if self.vis_gaussians and self.results.current_gaussian_id:
+                        self.results.step_gaussian_ids[step] = (
+                            self.results.current_gaussian_id
+                        )
+                # Progressive optimization
+                if (
+                    not self.disable_prog_optim
+                    and step % self.add_steps == 0
+                    and self.num_gaussians < self.total_num_gaussians
+                ):
+                    self._add_gaussians(self.max_add_num)
+                # Learning rate schedule
+                terminate = False
+                if (
+                    not self.disable_lr_schedule
+                    and self.num_gaussians == self.total_num_gaussians
+                    and step % self.eval_steps == 0
+                ):
+                    terminate = self._lr_schedule()
+                if terminate or self.stop_requested:
+                    if self.stop_requested:
+                        self.worklog.info("Training stopped by user request")
+                    break
+        # Final updates
+        with torch.no_grad():
+            images, _ = self.forward(self.img_h, self.img_w, self.tile_bounds)
+            self.results.final_render = self._tensor_to_pil_image(images)
+            # Save final checkpoint and store path
+            self._save_final_checkpoint()
+            self.results.is_complete = True
+            self.worklog.info("Optimization completed")
+    def _get_total_loss(self, images):
+        self.total_loss = 0
+        if self.l1_loss_ratio > 1e-7:
+            self.l1_loss = self.l1_loss_ratio * F.l1_loss(images, self.gt_images)
+            self.total_loss += self.l1_loss
+        else:
+            self.l1_loss = None
+        if self.l2_loss_ratio > 1e-7:
+            self.l2_loss = self.l2_loss_ratio * F.mse_loss(images, self.gt_images)
+            self.total_loss += self.l2_loss
+        else:
+            self.l2_loss = None
+        if self.ssim_loss_ratio > 1e-7:
+            self.ssim_loss = self.ssim_loss_ratio * (
+                1 - fused_ssim(images.unsqueeze(0), self.gt_images.unsqueeze(0))
+            )
+            self.total_loss += self.ssim_loss
+        else:
+            self.ssim_loss = None
+    def _evaluate_and_update_stream(self, images):
+        """Evaluate current state and update streaming results"""
+        gamma_corrected_images = torch.pow(
+            torch.clamp(images, 0.0, 1.0), 1.0 / self.gamma
+        )
+        gamma_corrected_gt = torch.pow(self.gt_images, 1.0 / self.gamma)
+        psnr = get_psnr(gamma_corrected_images, gamma_corrected_gt).item()
+        ssim = fused_ssim(
+            gamma_corrected_images.unsqueeze(0), gamma_corrected_gt.unsqueeze(0)
+        ).item()
+        self.psnr_curr, self.ssim_curr = psnr, ssim
+        # Update metrics
+        self.results.metrics.update(
+            {
+                "psnr": psnr,
+                "ssim": ssim,
+                "loss": self.total_loss.item(),
+                "render_time": self.render_time_accum,
+                "total_time": self.total_time_accum,
+            }
+        )
+        # Log progress
+        loss_results = f"Loss: {self.total_loss.item():.4f}"
+        if self.l1_loss is not None:
+            loss_results += f", L1: {self.l1_loss.item():.4f}"
+        if self.l2_loss is not None:
+            loss_results += f", L2: {self.l2_loss.item():.4f}"
+        if self.ssim_loss is not None:
+            loss_results += f", SSIM: {self.ssim_loss.item():.4f}"
+        time_results = f"Total: {self.total_time_accum:.2f} s | Render: {self.render_time_accum:.2f} s"
+        self.worklog.info(
+            f"Step: {self.step:d} | {time_results} | {loss_results} | PSNR: {psnr:.2f} | SSIM: {ssim:.4f}"
+        )
+    def _save_final_checkpoint(self):
+        """Save final checkpoint and store the path"""
+        if self.quantize:
+            with torch.no_grad():
+                self.xy.copy_(ste_quantize(self.xy, self.pos_bits))
+                self.scale.copy_(ste_quantize(self.scale, self.scale_bits))
+                self.rot.copy_(ste_quantize(self.rot, self.rot_bits))
+                self.feat.copy_(ste_quantize(self.feat, self.feat_bits))
+        # Create checkpoint directory
+        ckpt_dir = os.path.join(self.log_dir, "checkpoints")
+        os.makedirs(ckpt_dir, exist_ok=True)
+        psnr = self.results.metrics.get("psnr", 0.0)
+        ssim = self.results.metrics.get("ssim", 0.0)
+        ckpt_data = {
+            "step": self.step,
+            "psnr": psnr,
+            "ssim": ssim,
+            "bytes": getattr(self, "num_bytes", 0),
+            "time": self.total_time_accum,
+            "state_dict": self.state_dict(),
+            "optim_state_dict": self.optimizer.state_dict(),
+        }
+        ckpt_path = os.path.join(ckpt_dir, f"ckpt_step-{self.step:d}.pt")
+        torch.save(ckpt_data, ckpt_path)
+        self.results.final_checkpoint_path = ckpt_path
+        self.worklog.info(f"Final checkpoint saved: {ckpt_path}")
+    def _lr_schedule(self):
+        """Learning rate scheduling logic"""
+        if (
+            self.psnr_curr <= self.best_psnr + 100 * self.decay_threshold
+            or self.ssim_curr <= self.best_ssim + self.decay_threshold
+        ):
+            self.no_improvement_steps += self.eval_steps
+            if self.no_improvement_steps >= self.check_decay_steps:
+                self.no_improvement_steps = 0
+                self.decay_times += 1
+                if self.decay_times > self.max_decay_times:
+                    return True
+                for param_group in self.optimizer.param_groups:
+                    param_group["lr"] /= self.decay_ratio
+                self.worklog.info(f"Learning rate decayed by {self.decay_ratio:.1f}")
+            return False
+        else:
+            self.best_psnr = self.psnr_curr
+            self.best_ssim = self.ssim_curr
+            self.no_improvement_steps = 0
+            return False
+    def _add_gaussians(self, add_num):
+        """Add Gaussians during progressive optimization"""
+        add_num = min(
+            add_num, self.max_add_num, self.total_num_gaussians - self.num_gaussians
+        )
+        if add_num <= 0:
+            return
+        # Compute error map for new Gaussian placement
+        raw_images, _ = self.forward(self.img_h, self.img_w, self.tile_bounds)
+        images = torch.pow(torch.clamp(raw_images, 0.0, 1.0), 1.0 / self.gamma)
+        gt_images = torch.pow(self.gt_images, 1.0 / self.gamma)
+        kernel_size = round(np.sqrt(self.img_h * self.img_w) // 400)
+        if kernel_size >= 1:
+            kernel_size = max(3, kernel_size)
+            kernel_size = kernel_size + 1 if kernel_size % 2 == 0 else kernel_size
+            gt_images = gaussian_blur(img=gt_images, kernel_size=kernel_size)
+        diff_map = (gt_images - images).detach().clone()
+        error_map = torch.pow(torch.abs(diff_map).mean(dim=0).reshape(-1), 2.0)
+        sample_prob = (error_map / error_map.sum()).cpu().numpy()
+        selected = np.random.choice(
+            self.num_pixels, add_num, replace=False, p=sample_prob
+        )
+        # Create new Gaussians
+        new_xy = self.pixel_xy.detach().clone()[selected]
+        new_scale = torch.ones(add_num, 2, dtype=self.dtype, device=self.device)
+        init_scale = self.init_scale
+        new_scale.fill_(init_scale if self.disable_inverse_scale else 1.0 / init_scale)
+        new_rot = torch.zeros(add_num, 1, dtype=self.dtype, device=self.device)
+        new_feat = diff_map.permute(1, 2, 0).reshape(-1, self.feat_dim)[selected]
+        new_vis_feat = torch.rand_like(new_feat)
+        # Update parameters
+        old_xy = self.xy.detach().clone()
+        old_scale = self.scale.detach().clone()
+        old_rot = self.rot.detach().clone()
+        old_feat = self.feat.detach().clone()
+        old_vis_feat = self.vis_feat.detach().clone()
+        self.num_gaussians += add_num
+        all_xy = torch.cat([old_xy, new_xy], dim=0)
+        all_scale = torch.cat([old_scale, new_scale], dim=0)
+        all_rot = torch.cat([old_rot, new_rot], dim=0)
+        all_feat = torch.cat([old_feat, new_feat], dim=0)
+        all_vis_feat = torch.cat([old_vis_feat, new_vis_feat], dim=0)
+        self.xy = nn.Parameter(all_xy, requires_grad=True)
+        self.scale = nn.Parameter(all_scale, requires_grad=True)
+        self.rot = nn.Parameter(all_rot, requires_grad=True)
+        self.feat = nn.Parameter(all_feat, requires_grad=True)
+        self.vis_feat = nn.Parameter(all_vis_feat, requires_grad=False)
+        # Update optimizer
+        self.optimizer = torch.optim.Adam(
+            [
+                {"params": self.xy, "lr": self.pos_lr},
+                {"params": self.scale, "lr": self.scale_lr},
+                {"params": self.rot, "lr": self.rot_lr},
+                {"params": self.feat, "lr": self.feat_lr},
+            ]
+        )
+        self.worklog.info(
+            f"Step: {self.step:d} | Adding {add_num:d} Gaussians ({self.num_gaussians - add_num:d} -> {self.num_gaussians:d})"
+        )

main.py ADDED Viewed

	@@ -0,0 +1,57 @@

+import argparse
+import torch
+from model import GaussianSplatting2D
+from utils.misc_utils import load_cfg
+def get_gaussian_cfg(args):
+    gaussian_cfg = f"num-{args.num_gaussians:d}"
+    if args.disable_inverse_scale:
+        gaussian_cfg += f"_scale-{args.init_scale:.1f}"
+    else:
+        gaussian_cfg += f"_inv-scale-{args.init_scale:.1f}"
+    if not args.quantize:
+        args.pos_bits, args.scale_bits, args.rot_bits, args.feat_bits = 32, 32, 32, 32
+    min_bits = min(args.pos_bits, args.scale_bits, args.rot_bits, args.feat_bits)
+    max_bits = max(args.pos_bits, args.scale_bits, args.rot_bits, args.feat_bits)
+    if min_bits < 4 or max_bits > 32:
+        raise ValueError(
+            f"Bit precision must be between 4 and 32 but got: {args.pos_bits:d}, {args.scale_bits:d}, {args.rot_bits:d}, {args.feat_bits:d}"
+        )
+    gaussian_cfg += f"_bits-{args.pos_bits:d}-{args.scale_bits:d}-{args.rot_bits:d}-{args.feat_bits:d}"
+    if not args.disable_topk_norm:
+        gaussian_cfg += f"_top-{args.topk:d}"
+    gaussian_cfg += f"_{args.init_mode[0]}-{args.init_random_ratio:.1f}"
+    return gaussian_cfg
+def get_log_dir(args):
+    gaussian_cfg = get_gaussian_cfg(args)
+    loss_cfg = f"l1-{args.l1_loss_ratio:.1f}_l2-{args.l2_loss_ratio:.1f}_ssim-{args.ssim_loss_ratio:.1f}"
+    folder = f"{gaussian_cfg}_{loss_cfg}"
+    if args.downsample:
+        folder += f"_ds-{args.downsample_ratio:.1f}"
+    if not args.disable_lr_schedule:
+        folder += f"_decay-{args.max_decay_times:d}-{args.decay_ratio:.1f}"
+    if not args.disable_prog_optim:
+        folder += "_prog"
+    return f"{args.log_root}/{args.exp_name}/{folder}"
+def main(args):
+    args.log_dir = get_log_dir(args)
+    ImageGS = GaussianSplatting2D(args)
+    if args.eval:
+        ImageGS.render(render_height=args.render_height)
+    else:
+        ImageGS.optimize()
+if __name__ == "__main__":
+    torch.hub.set_dir("models/torch")
+    parser = argparse.ArgumentParser()
+    parser = load_cfg(cfg_path="cfgs/default.yaml", parser=parser)
+    arguments = parser.parse_args()
+    main(arguments)

model.py ADDED Viewed

	@@ -0,0 +1,824 @@

+import logging
+import math
+import os
+import sys
+from time import perf_counter
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from fused_ssim import fused_ssim
+from lpips import LPIPS
+from pytorch_msssim import MS_SSIM
+from torchvision.transforms.functional import gaussian_blur
+from gsplat import (
+    project_gaussians_2d_scale_rot,
+    rasterize_gaussians_no_tiles,
+    rasterize_gaussians_sum,
+)
+from utils.flip import LDRFLIPLoss
+from utils.image_utils import (
+    compute_image_gradients,
+    get_grid,
+    get_psnr,
+    load_images,
+    save_image,
+    separate_image_channels,
+    visualize_added_gaussians,
+    visualize_gaussians,
+)
+from utils.misc_utils import clean_dir, get_latest_ckpt_step, save_cfg, set_random_seed
+from utils.quantization_utils import ste_quantize
+from utils.saliency_utils import get_smap
+class GaussianSplatting2D(nn.Module):
+    def __init__(self, args):
+        super(GaussianSplatting2D, self).__init__()
+        self.evaluate = args.eval
+        set_random_seed(seed=args.seed)
+        # Ensure we're using the correct CUDA device
+        if torch.cuda.is_available():
+            torch.cuda.set_device(0)  # Force device 0
+            self.device = torch.device("cuda:0")
+        else:
+            self.device = torch.device("cpu")
+        self.dtype = torch.float32
+        self._init_logging(args)
+        self._init_target(args)
+        self._init_bit_precision(args)
+        self._init_gaussians(args)
+        self._init_loss(args)
+        self._init_optimization(args)
+        # Initialization
+        if self.evaluate:
+            self.ckpt_file = args.ckpt_file
+            self._load_model()
+        else:
+            self._init_pos_scale_feat(args)
+    def _init_logging(self, args):
+        self.log_dir = args.log_dir
+        self.log_level = args.log_level
+        self.ckpt_dir = os.path.join(self.log_dir, "checkpoints")
+        self.train_dir = os.path.join(self.log_dir, "train")
+        self.eval_dir = os.path.join(self.log_dir, "eval")
+        self.vis_gaussians = args.vis_gaussians
+        self.save_image_steps = args.save_image_steps
+        self.save_ckpt_steps = args.save_ckpt_steps
+        self.eval_steps = args.eval_steps
+        if not self.evaluate:
+            clean_dir(path=self.log_dir)
+            os.makedirs(self.log_dir, exist_ok=False)
+            os.makedirs(self.ckpt_dir, exist_ok=False)
+            os.makedirs(self.train_dir, exist_ok=False)
+        else:
+            os.makedirs(self.eval_dir, exist_ok=True)
+        self._gen_logger(args)
+        if not self.evaluate:
+            save_cfg(path=f"{self.log_dir}/cfg_train.yaml", args=args)
+    def _gen_logger(self, args):
+        log_fname = "log_train"
+        if self.evaluate:
+            log_fname = "log_eval"
+        log_level = getattr(logging, self.log_level, logging.INFO)
+        logging.basicConfig(level=log_level)
+        self.worklog = logging.getLogger("Image-GS Logger")
+        self.worklog.propagate = False
+        datefmt = "%Y/%m/%d %H:%M:%S"
+        fileHandler = logging.FileHandler(
+            f"{self.log_dir}/{log_fname}.txt", mode="a", encoding="utf8"
+        )
+        fileHandler.setFormatter(
+            logging.Formatter(fmt="[{asctime}] {message}", datefmt=datefmt, style="{")
+        )
+        consoleHandler = logging.StreamHandler(sys.stdout)
+        consoleHandler.setFormatter(
+            logging.Formatter(
+                fmt="\x1b[32m[{asctime}] \x1b[0m{message}", datefmt=datefmt, style="{"
+            )
+        )
+        self.worklog.handlers = [fileHandler, consoleHandler]
+        action = "rendering" if self.evaluate else "optimizing"
+        self.worklog.info(
+            f"Start {action} {args.num_gaussians:d} Gaussians for '{args.input_path}'"
+        )
+        self.worklog.info("***********************************************")
+    def _init_target(self, args):
+        self.gamma = args.gamma
+        self.downsample = args.downsample
+        if self.downsample:
+            self.downsample_ratio = float(args.downsample_ratio)
+        self.block_h, self.block_w = (
+            16,
+            16,
+        )  # Warning: Must match hardcoded value in CUDA kernel, modify with caution
+        self._load_target_images(path=os.path.join(args.data_root, args.input_path))
+        if self.downsample:
+            self.gt_images_upsampled = self.gt_images
+            self.img_h_upsampled, self.img_w_upsampled = self.img_h, self.img_w
+            self.tile_bounds_upsampled = self.tile_bounds
+            self._load_target_images(
+                path=os.path.join(args.data_root, args.input_path),
+                downsample_ratio=self.downsample_ratio,
+            )
+            if not self.evaluate:
+                path = f"{self.log_dir}/gt_upsample-{self.downsample_ratio:.1f}_res-{self.img_h_upsampled:d}x{self.img_w_upsampled:d}"
+                self._separate_and_save_images(
+                    images=self.gt_images_upsampled,
+                    channels=self.input_channels,
+                    path=path,
+                )
+        self.num_pixels = self.img_h * self.img_w
+        if not self.evaluate:
+            path = f"{self.log_dir}/gt_res-{self.img_h:d}x{self.img_w:d}"
+            self._separate_and_save_images(
+                images=self.gt_images, channels=self.input_channels, path=path
+            )
+    def _load_target_images(self, path, downsample_ratio=None):
+        self.gt_images, self.input_channels, self.image_fnames = load_images(
+            load_path=path, downsample_ratio=downsample_ratio, gamma=self.gamma
+        )
+        self.gt_images = torch.from_numpy(self.gt_images).to(
+            dtype=self.dtype, device=self.device
+        )
+        self.img_h, self.img_w = self.gt_images.shape[1:]
+        self.tile_bounds = (
+            (self.img_w + self.block_w - 1) // self.block_w,
+            (self.img_h + self.block_h - 1) // self.block_h,
+            1,
+        )
+    def _separate_and_save_images(self, images, channels, path):
+        images_sep = separate_image_channels(images=images, input_channels=channels)
+        for idx, image in enumerate(images_sep, 1):
+            suffix = "" if len(images_sep) == 1 else f"_{idx:d}"
+            save_image(image, f"{path}{suffix}.png", gamma=self.gamma)
+    def _init_bit_precision(self, args):
+        self.quantize = args.quantize
+        self.pos_bits = args.pos_bits
+        self.scale_bits = args.scale_bits
+        self.rot_bits = args.rot_bits
+        self.feat_bits = args.feat_bits
+    def _init_gaussians(self, args):
+        self.num_gaussians = args.num_gaussians
+        self.total_num_gaussians = args.num_gaussians
+        self.disable_prog_optim = args.disable_prog_optim
+        if not self.disable_prog_optim and not self.evaluate:
+            self.initial_ratio = args.initial_ratio
+            self.add_times = args.add_times
+            self.add_steps = args.add_steps
+            self.num_gaussians = math.ceil(
+                self.initial_ratio * self.total_num_gaussians
+            )
+            self.max_add_num = math.ceil(
+                float(self.total_num_gaussians - self.num_gaussians) / self.add_times
+            )
+            min_steps = self.add_steps * self.add_times + args.post_min_steps
+            if args.max_steps < min_steps:
+                self.worklog.info(
+                    f"Max steps ({args.max_steps:d}) is too small for progressive optimization. Resetting to {min_steps:d}"
+                )
+                args.max_steps = min_steps
+        self.topk = (
+            args.topk
+        )  # Warning: Must match hardcoded value in CUDA kernel, modify with caution
+        self.eps = (
+            1e-7 if args.disable_tiles else 1e-4
+        )  # Warning: Must match hardcoded value in CUDA kernel, modify with caution
+        self.init_scale = args.init_scale
+        self.disable_topk_norm = args.disable_topk_norm
+        self.disable_inverse_scale = args.disable_inverse_scale
+        self.disable_color_init = args.disable_color_init
+        self.xy = nn.Parameter(
+            torch.rand(self.num_gaussians, 2, dtype=self.dtype, device=self.device),
+            requires_grad=True,
+        )
+        self.scale = nn.Parameter(
+            torch.ones(self.num_gaussians, 2, dtype=self.dtype, device=self.device),
+            requires_grad=True,
+        )
+        self.rot = nn.Parameter(
+            torch.zeros(self.num_gaussians, 1, dtype=self.dtype, device=self.device),
+            requires_grad=True,
+        )
+        self.feat_dim = sum(self.input_channels)
+        self.feat = nn.Parameter(
+            torch.rand(
+                self.num_gaussians, self.feat_dim, dtype=self.dtype, device=self.device
+            ),
+            requires_grad=True,
+        )
+        self.vis_feat = nn.Parameter(
+            torch.rand_like(self.feat), requires_grad=False
+        )  # Only used for Gaussian ID visualization
+        self._log_compression_rate()
+    def _log_compression_rate(self):
+        bytes_uncompressed = float(self.gt_images.numel())
+        bpp_uncompressed = float(8 * self.feat_dim)
+        self.worklog.info(
+            f"Uncompressed: {bytes_uncompressed / 1e3:.2f} KB | {bpp_uncompressed:.3f} bpp | 8.0 bppc"
+        )
+        bits_compressed = (
+            2 * self.pos_bits
+            + 2 * self.scale_bits
+            + self.rot_bits
+            + self.feat_dim * self.feat_bits
+        ) * self.total_num_gaussians
+        bytes_compressed = bits_compressed / 8.0
+        bpp_compressed = float(bits_compressed) / self.num_pixels
+        bppc_compressed = bpp_compressed / self.feat_dim
+        self.num_bytes = bytes_compressed
+        self.worklog.info(
+            f"Compressed: {bytes_compressed / 1e3:.2f} KB | {bpp_compressed:.3f} bpp | {bppc_compressed:.3f} bppc"
+        )
+        self.worklog.info(
+            f"Compression rate: {bpp_uncompressed / bpp_compressed:.2f}x | {100.0 * bpp_compressed / bpp_uncompressed:.2f}%"
+        )
+        self.worklog.info("***********************************************")
+    def _init_loss(self, args):
+        self.l1_loss = None
+        self.l2_loss = None
+        self.ssim_loss = None
+        self.l1_loss_ratio = args.l1_loss_ratio
+        self.l2_loss_ratio = args.l2_loss_ratio
+        self.ssim_loss_ratio = args.ssim_loss_ratio
+    def _init_optimization(self, args):
+        self.disable_tiles = args.disable_tiles
+        self.start_step = 1
+        self.max_steps = args.max_steps
+        self.pos_lr = args.pos_lr
+        self.scale_lr = args.scale_lr
+        self.rot_lr = args.rot_lr
+        self.feat_lr = args.feat_lr
+        self.optimizer = torch.optim.Adam(
+            [
+                {"params": self.xy, "lr": self.pos_lr},
+                {"params": self.scale, "lr": self.scale_lr},
+                {"params": self.rot, "lr": self.rot_lr},
+                {"params": self.feat, "lr": self.feat_lr},
+            ]
+        )
+        self.disable_lr_schedule = args.disable_lr_schedule
+        if not self.disable_lr_schedule:
+            self.decay_ratio = args.decay_ratio
+            self.check_decay_steps = args.check_decay_steps
+            self.max_decay_times = args.max_decay_times
+            self.decay_threshold = args.decay_threshold
+    def _init_pos_scale_feat(self, args):
+        self.init_mode = args.init_mode
+        self.init_random_ratio = args.init_random_ratio
+        self.pixel_xy = (
+            get_grid(h=self.img_h, w=self.img_w)
+            .to(dtype=self.dtype, device=self.device)
+            .reshape(-1, 2)
+        )
+        with torch.no_grad():
+            # Position
+            if self.init_mode == "gradient":
+                self._compute_gmap()
+                self.xy.copy_(self._sample_pos(prob=self.image_gradients))
+            elif self.init_mode == "saliency":
+                self.smap_filter_size = args.smap_filter_size
+                self._compute_smap(path="models")
+                self.xy.copy_(self._sample_pos(prob=self.saliency))
+            else:
+                selected = np.random.choice(
+                    self.num_pixels, self.num_gaussians, replace=False, p=None
+                )
+                self.xy.copy_(self.pixel_xy.detach().clone()[selected])
+            # Scale
+            self.scale.fill_(
+                self.init_scale if self.disable_inverse_scale else 1.0 / self.init_scale
+            )
+            # Feature
+            if not self.disable_color_init:
+                self.feat.copy_(
+                    self._get_target_features(positions=self.xy).detach().clone()
+                )
+    def _sample_pos(self, prob):
+        num_random = round(self.init_random_ratio * self.num_gaussians)
+        selected_random = np.random.choice(
+            self.num_pixels, num_random, replace=False, p=None
+        )
+        selected_other = np.random.choice(
+            self.num_pixels, self.num_gaussians - num_random, replace=False, p=prob
+        )
+        return torch.cat(
+            [
+                self.pixel_xy.detach().clone()[selected_random],
+                self.pixel_xy.detach().clone()[selected_other],
+            ],
+            dim=0,
+        )
+    def _compute_gmap(self):
+        gy, gx = compute_image_gradients(
+            np.power(self.gt_images.detach().cpu().clone().numpy(), 1.0 / self.gamma)
+        )
+        g_norm = np.hypot(gy, gx).astype(np.float32)
+        g_norm = g_norm / g_norm.max()
+        save_image(g_norm, f"{self.log_dir}/gmap_res-{self.img_h:d}x{self.img_w:d}.png")
+        g_norm = np.power(g_norm.reshape(-1), 2.0)
+        self.image_gradients = g_norm / g_norm.sum()
+        self.worklog.info("Image gradient map successfully saved")
+        self.worklog.info("***********************************************")
+    def _compute_smap(self, path):
+        smap = get_smap(
+            torch.pow(self.gt_images.detach().clone(), 1.0 / self.gamma),
+            path,
+            self.smap_filter_size,
+        )
+        save_image(smap, f"{self.log_dir}/smap_res-{self.img_h:d}x{self.img_w:d}.png")
+        self.saliency = (smap / smap.sum()).reshape(-1)
+        self.worklog.info("Saliency map successfully saved")
+        self.worklog.info("***********************************************")
+    def _get_target_features(self, positions):
+        with torch.no_grad():
+            # gt_images [1, C, H, W]; positions [1, 1, P, 2]; top-left [-1, -1]; bottom-right [1, 1]
+            target_features = F.grid_sample(
+                self.gt_images.unsqueeze(0),
+                positions[None, None, ...] * 2.0 - 1.0,
+                align_corners=False,
+            )
+            target_features = target_features[0, :, 0, :].permute(1, 0)  # [P, C]
+        return target_features
+    def _load_model(self):
+        if self.ckpt_file != "":
+            ckpt_path = os.path.join(self.ckpt_dir, self.ckpt_file)
+        else:
+            latest_step = get_latest_ckpt_step(self.ckpt_dir)
+            if latest_step == -1:
+                raise FileNotFoundError(f"No checkpoint found in '{self.ckpt_dir}'")
+            ckpt_path = os.path.join(self.ckpt_dir, f"ckpt_step-{latest_step:d}.pt")
+        checkpoint = torch.load(ckpt_path, weights_only=False)
+        self.load_state_dict(checkpoint["state_dict"])
+        self.optimizer.load_state_dict(checkpoint["optim_state_dict"])
+        self.start_step = checkpoint["step"] + 1
+        self.worklog.info(f"Checkpoint '{ckpt_path}' successfully loaded")
+        self.worklog.info("***********************************************")
+    def _save_model(self):
+        if self.quantize:
+            self._quantize()
+        psnr, ssim = self._evaluate(log=False, upsample=False)
+        self._evaluate_extra()
+        ckpt_data = {
+            "step": self.step,
+            "psnr": psnr,
+            "ssim": ssim,
+            "lpips": self.lpips_final,
+            "flip": self.flip_final,
+            "msssim": self.msssim_final,
+            "bytes": self.num_bytes,
+            "time": self.total_time_accum,
+            "state_dict": self.state_dict(),
+            "optim_state_dict": self.optimizer.state_dict(),
+        }
+        save_path = f"{self.ckpt_dir}/ckpt_step-{self.step:d}.pt"
+        torch.save(ckpt_data, save_path)
+        self.worklog.info(f"Checkpoint 'ckpt_step-{self.step:d}.pt' successfully saved")
+        self.worklog.info(
+            f"PSNR: {psnr:.2f} | SSIM: {ssim:.4f} | LPIPS: {self.lpips_final:.4f} | FLIP: {self.flip_final:.4f} | MS-SSIM: {self.msssim_final:.4f}"
+        )
+        self.worklog.info("***********************************************")
+    def _quantize(self):
+        with torch.no_grad():
+            self.xy.copy_(ste_quantize(self.xy, self.pos_bits))
+            self.scale.copy_(ste_quantize(self.scale, self.scale_bits))
+            self.rot.copy_(ste_quantize(self.rot, self.rot_bits))
+            self.feat.copy_(ste_quantize(self.feat, self.feat_bits))
+    def render(self, render_height=None):
+        img_h, img_w = self.img_h, self.img_w
+        if render_height is not None:
+            img_h, img_w = render_height, round((float(render_height) / img_h) * img_w)
+        tile_bounds = (
+            (img_w + self.block_w - 1) // self.block_w,
+            (img_h + self.block_h - 1) // self.block_h,
+            1,
+        )
+        upsample_ratio = float(img_h) / self.img_h
+        with torch.no_grad():
+            num_prep_runs = 2
+            for _ in range(num_prep_runs):
+                self.forward(img_h, img_w, tile_bounds, upsample_ratio, benchmark=True)
+            images, render_time = self.forward(
+                img_h, img_w, tile_bounds, upsample_ratio
+            )
+            path = f"{self.eval_dir}/render_upsample-{upsample_ratio:.1f}_res-{img_h:d}x{img_w:d}"
+            self._separate_and_save_images(
+                images=images, channels=self.input_channels, path=path
+            )
+        self.worklog.info(f"Step: {self.start_step - 1:d} | Time: {render_time:.6f} s")
+        self.worklog.info(f"Rendering at resolution ({img_h:d}, {img_w:d}) completed")
+        self.worklog.info("***********************************************")
+    def benchmark_render_time(self, num_reps, render_height=None):
+        img_h, img_w = self.img_h, self.img_w
+        if render_height is not None:
+            img_h, img_w = render_height, round((float(render_height) / img_h) * img_w)
+        tile_bounds = (
+            (img_w + self.block_w - 1) // self.block_w,
+            (img_h + self.block_h - 1) // self.block_h,
+            1,
+        )
+        upsample_ratio = float(img_h) / self.img_h
+        with torch.no_grad():
+            render_time_all = np.zeros(num_reps, dtype=np.float32)
+            num_prep_runs = 2
+            for _ in range(num_prep_runs):
+                self.forward(img_h, img_w, tile_bounds, upsample_ratio, benchmark=True)
+            for rid in range(num_reps):
+                render_time = self.forward(
+                    img_h, img_w, tile_bounds, upsample_ratio, benchmark=True
+                )
+                render_time_all[rid] = render_time
+        return render_time_all
+    def forward(self, img_h, img_w, tile_bounds, upsample_ratio=None, benchmark=False):
+        scale = self._get_scale(upsample_ratio=upsample_ratio)
+        xy, rot, feat = self.xy, self.rot, self.feat
+        if self.quantize:
+            xy, scale, rot, feat = (
+                ste_quantize(xy, self.pos_bits),
+                ste_quantize(scale, self.scale_bits),
+                ste_quantize(rot, self.rot_bits),
+                ste_quantize(feat, self.feat_bits),
+            )
+        begin = perf_counter()
+        tmp = project_gaussians_2d_scale_rot(xy, scale, rot, img_h, img_w, tile_bounds)
+        xy, radii, conics, num_tiles_hit = tmp
+        if not self.disable_tiles:
+            enable_topk_norm = not self.disable_topk_norm
+            tmp = (
+                xy,
+                radii,
+                conics,
+                num_tiles_hit,
+                feat,
+                img_h,
+                img_w,
+                self.block_h,
+                self.block_w,
+                enable_topk_norm,
+            )
+            out_image = rasterize_gaussians_sum(*tmp)
+        else:
+            tmp = xy, conics, feat, img_h, img_w
+            out_image = rasterize_gaussians_no_tiles(*tmp)
+        render_time = perf_counter() - begin
+        if benchmark:
+            return render_time
+        out_image = (
+            out_image.view(-1, img_h, img_w, self.feat_dim)
+            .permute(0, 3, 1, 2)
+            .contiguous()
+        )
+        return out_image.squeeze(dim=0), render_time
+    def _get_scale(self, upsample_ratio=None):
+        scale = self.scale
+        if not self.disable_inverse_scale:
+            scale = 1.0 / scale
+        if upsample_ratio is not None:
+            scale = upsample_ratio * scale
+        return scale
+    def _visualize_gaussian_id(self, img_h, img_w, tile_bounds, upsample_ratio=None):
+        scale = self._get_scale(upsample_ratio=upsample_ratio)
+        xy, rot, feat = self.xy, self.rot, self.feat
+        if self.quantize:
+            xy, scale, rot, feat = (
+                ste_quantize(xy, self.pos_bits),
+                ste_quantize(scale, self.scale_bits),
+                ste_quantize(rot, self.rot_bits),
+                ste_quantize(feat, self.feat_bits),
+            )
+        feat = self.vis_feat * feat.norm(dim=-1, keepdim=True)
+        tmp = project_gaussians_2d_scale_rot(xy, scale, rot, img_h, img_w, tile_bounds)
+        xy, radii, conics, num_tiles_hit = tmp
+        if not self.disable_tiles:
+            enable_topk_norm = not self.disable_topk_norm
+            tmp = (
+                xy,
+                radii,
+                conics,
+                num_tiles_hit,
+                feat,
+                img_h,
+                img_w,
+                self.block_h,
+                self.block_w,
+                enable_topk_norm,
+            )
+            out_image = rasterize_gaussians_sum(*tmp)
+        else:
+            tmp = xy, conics, feat, img_h, img_w
+            out_image = rasterize_gaussians_no_tiles(*tmp)
+        out_image = (
+            out_image.view(-1, img_h, img_w, self.feat_dim)
+            .permute(0, 3, 1, 2)
+            .contiguous()
+        )
+        return out_image.squeeze(dim=0)
+    def optimize(self):
+        self.psnr_curr, self.ssim_curr = 0.0, 0.0
+        self.best_psnr, self.best_ssim = 0.0, 0.0
+        self.decay_times, self.no_improvement_steps = 0, 0
+        self.render_time_accum, self.total_time_accum = 0.0, 0.0
+        self.lpips_final, self.flip_final, self.msssim_final = 1.0, 1.0, 0.0
+        self.step = 0
+        with torch.no_grad():
+            self._log_images(log_final=False, plot_gaussians=self.vis_gaussians)
+        for step in range(self.start_step, self.max_steps + 1):
+            self.step = step
+            self.optimizer.zero_grad()
+            # Rendering
+            images, render_time = self.forward(self.img_h, self.img_w, self.tile_bounds)
+            self.render_time_accum += render_time
+            # Optimization
+            begin = perf_counter()
+            self._get_total_loss(images)
+            self.total_loss.backward()
+            self.optimizer.step()
+            self.total_time_accum += perf_counter() - begin + render_time
+            # Logging
+            terminate = False
+            with torch.no_grad():
+                if self.step % self.eval_steps == 0:
+                    self._evaluate(log=True, upsample=False)
+                    if (
+                        not self.disable_lr_schedule
+                        and self.num_gaussians == self.total_num_gaussians
+                    ):
+                        terminate = self._lr_schedule()
+                if self.step % self.save_image_steps == 0:
+                    self._log_images(log_final=False, plot_gaussians=self.vis_gaussians)
+                if (
+                    self.step % self.save_ckpt_steps == 0
+                    and self.num_gaussians == self.total_num_gaussians
+                ):
+                    self._save_model()
+                if (
+                    not self.disable_prog_optim
+                    and self.step % self.add_steps == 0
+                    and self.num_gaussians < self.total_num_gaussians
+                ):
+                    self._add_gaussians(
+                        self.max_add_num, plot_gaussians=self.vis_gaussians
+                    )
+                if terminate:
+                    break
+        with torch.no_grad():
+            self._log_images(log_final=True, plot_gaussians=self.vis_gaussians)
+            self._save_model()
+        self.worklog.info("Optimization completed")
+        self.worklog.info("***********************************************")
+        self.worklog.info(
+            f"Mean scale: {self._get_scale().mean().item():.4f} (pixel) | {self.scale.mean().item():.4f} (raw)"
+        )
+        self.worklog.info("***********************************************")
+        return self.psnr_curr, self.ssim_curr
+    def _get_total_loss(self, images):
+        self.total_loss = 0
+        if self.l1_loss_ratio > 1e-7:
+            self.l1_loss = self.l1_loss_ratio * F.l1_loss(images, self.gt_images)
+            self.total_loss += self.l1_loss
+        else:
+            self.l1_loss = None
+        if self.l2_loss_ratio > 1e-7:
+            self.l2_loss = self.l2_loss_ratio * F.mse_loss(images, self.gt_images)
+            self.total_loss += self.l2_loss
+        else:
+            self.l2_loss = None
+        if self.ssim_loss_ratio > 1e-7:
+            self.ssim_loss = self.ssim_loss_ratio * (
+                1 - fused_ssim(images.unsqueeze(0), self.gt_images.unsqueeze(0))
+            )
+            self.total_loss += self.ssim_loss
+        else:
+            self.ssim_loss = None
+    def _evaluate(self, log=True, upsample=False):
+        if upsample:  # Do not log performance metrics for upsampled images
+            log = False
+        images = torch.pow(
+            torch.clamp(self._render_images(upsample=upsample), 0.0, 1.0),
+            1.0 / self.gamma,
+        )
+        gt_images = torch.pow(
+            self.gt_images_upsampled if upsample else self.gt_images, 1.0 / self.gamma
+        )
+        psnr = get_psnr(images, gt_images).item()
+        ssim = fused_ssim(images.unsqueeze(0), gt_images.unsqueeze(0)).item()
+        if log:
+            self.psnr_curr, self.ssim_curr = psnr, ssim
+            loss_results = f"Loss: {self.total_loss.item():.4f}"
+            loss_results += (
+                f", L1: {self.l1_loss.item():.4f}" if self.l1_loss is not None else ""
+            )
+            loss_results += (
+                f", L2: {self.l2_loss.item():.4f}" if self.l2_loss is not None else ""
+            )
+            loss_results += (
+                f", SSIM: {self.ssim_loss.item():.4f}"
+                if self.ssim_loss is not None
+                else ""
+            )
+            time_results = f"Total: {self.total_time_accum:.2f} s | Render: {self.render_time_accum:.2f} s"
+            self.worklog.info(
+                f"Step: {self.step:d} | {time_results} | {loss_results} | PSNR: {self.psnr_curr:.2f} | SSIM: {self.ssim_curr:.4f}"
+            )
+        return psnr, ssim
+    def _evaluate_extra(self):
+        images = torch.pow(
+            torch.clamp(self._render_images(upsample=False), 0.0, 1.0), 1.0 / self.gamma
+        )[None, ...]
+        gt_images = torch.pow(self.gt_images, 1.0 / self.gamma)[None, ...]
+        msssim_metric = (
+            MS_SSIM(data_range=1.0, size_average=True, channel=self.feat_dim)
+            .to(device=self.device)
+            .eval()
+        )
+        self.msssim_final = msssim_metric(images, gt_images).item()
+        lpips_metric = LPIPS(net="alex").to(device=self.device).eval()
+        flip_metric = LDRFLIPLoss().to(device=self.device).eval()
+        num_channels = 1 if self.feat_dim < 3 else 3
+        self.lpips_final = lpips_metric(
+            images[:, :num_channels], gt_images[:, :num_channels]
+        ).item()
+        if self.feat_dim >= 3:
+            self.flip_final = flip_metric(images[:, :3], gt_images[:, :3]).item()
+    def _log_images(self, log_final=False, plot_gaussians=False):
+        images = self._render_images(upsample=False)
+        if log_final:
+            path = f"{self.log_dir}/render_res-{self.img_h:d}x{self.img_w:d}"
+            self._separate_and_save_images(
+                images=images, channels=self.input_channels, path=path
+            )
+        psnr, ssim = self._evaluate(log=False, upsample=False)
+        path = f"{self.train_dir}/render_step-{self.step:d}_psnr-{psnr:.2f}_ssim-{ssim:.4f}_res-{self.img_h:d}x{self.img_w:d}"
+        self._separate_and_save_images(
+            images=images, channels=self.input_channels, path=path
+        )
+        if plot_gaussians:
+            path = f"{self.train_dir}/gaussian_step-{self.step:d}_psnr-{psnr:.2f}_ssim-{ssim:.4f}_res-{self.img_h:d}x{self.img_w:d}"
+            visualize_gaussians(
+                path,
+                self.xy,
+                self._get_scale(),
+                self.rot,
+                self.feat,
+                self.img_h,
+                self.img_w,
+                self.input_channels,
+                alpha=0.8,
+                gamma=self.gamma,
+            )
+            images = self._visualize_gaussian_id(
+                self.img_h, self.img_w, self.tile_bounds
+            )
+            path = f"{self.train_dir}/gaussian-id_step-{self.step:d}_psnr-{psnr:.2f}_ssim-{ssim:.4f}_res-{self.img_h:d}x{self.img_w:d}"
+            self._separate_and_save_images(
+                images=images, channels=self.input_channels, path=path
+            )
+        if self.downsample:
+            images = self._render_images(upsample=True)
+            psnr, ssim = self._evaluate(log=False, upsample=True)
+            img_h, img_w = self.img_h_upsampled, self.img_w_upsampled
+            path = f"{self.train_dir}/render_upsample-{self.downsample_ratio:.1f}_step-{self.step:d}_psnr-{psnr:.2f}_ssim-{ssim:.4f}_res-{img_h:d}x{img_w:d}"
+            self._separate_and_save_images(
+                images=images, channels=self.input_channels, path=path
+            )
+    def _render_images(self, upsample=False):
+        if upsample:
+            images, _ = self.forward(
+                self.img_h_upsampled,
+                self.img_w_upsampled,
+                self.tile_bounds_upsampled,
+                upsample_ratio=self.downsample_ratio,
+            )
+        else:
+            images, _ = self.forward(self.img_h, self.img_w, self.tile_bounds)
+        return images
+    def _lr_schedule(self):
+        if (
+            self.psnr_curr <= self.best_psnr + 100 * self.decay_threshold
+            or self.ssim_curr <= self.best_ssim + self.decay_threshold
+        ):
+            self.no_improvement_steps += self.eval_steps
+            if self.no_improvement_steps >= self.check_decay_steps:
+                self.no_improvement_steps = 0
+                self.decay_times += 1
+                if self.decay_times > self.max_decay_times:
+                    return True
+                for param_group in self.optimizer.param_groups:
+                    param_group["lr"] /= self.decay_ratio
+                self.worklog.info(f"Learning rate decayed by {self.decay_ratio:.1f}")
+                self.worklog.info("***********************************************")
+            return False
+        else:
+            self.best_psnr = self.psnr_curr
+            self.best_ssim = self.ssim_curr
+            self.no_improvement_steps = 0
+            return False
+    def _add_gaussians(self, add_num, plot_gaussians=False):
+        add_num = min(
+            add_num, self.max_add_num, self.total_num_gaussians - self.num_gaussians
+        )
+        if add_num <= 0:
+            return
+        raw_images = self._render_images(upsample=False)
+        images = torch.pow(torch.clamp(raw_images, 0.0, 1.0), 1.0 / self.gamma)
+        gt_images = torch.pow(self.gt_images, 1.0 / self.gamma)
+        kernel_size = round(np.sqrt(self.img_h * self.img_w) // 400)
+        if kernel_size >= 1:
+            kernel_size = max(3, kernel_size)
+            kernel_size = kernel_size + 1 if kernel_size % 2 == 0 else kernel_size
+            gt_images = gaussian_blur(img=gt_images, kernel_size=kernel_size)
+        diff_map = (gt_images - images).detach().clone()
+        error_map = torch.pow(torch.abs(diff_map).mean(dim=0).reshape(-1), 2.0)
+        sample_prob = (error_map / error_map.sum()).cpu().numpy()
+        selected = np.random.choice(
+            self.num_pixels, add_num, replace=False, p=sample_prob
+        )
+        # New Gaussians
+        new_xy = self.pixel_xy.detach().clone()[selected]
+        new_scale = torch.ones(add_num, 2, dtype=self.dtype, device=self.device)
+        init_scale = self.init_scale
+        new_scale.fill_(init_scale if self.disable_inverse_scale else 1.0 / init_scale)
+        new_rot = torch.zeros(add_num, 1, dtype=self.dtype, device=self.device)
+        new_feat = diff_map.permute(1, 2, 0).reshape(-1, self.feat_dim)[selected]
+        new_vis_feat = torch.rand_like(new_feat)
+        # Old Gaussians
+        old_xy = self.xy.detach().clone()
+        old_scale = self.scale.detach().clone()
+        old_rot = self.rot.detach().clone()
+        old_feat = self.feat.detach().clone()
+        old_vis_feat = self.vis_feat.detach().clone()
+        # Update trainable parameters
+        self.num_gaussians += add_num
+        all_xy = torch.cat([old_xy, new_xy], dim=0)
+        all_scale = torch.cat([old_scale, new_scale], dim=0)
+        all_rot = torch.cat([old_rot, new_rot], dim=0)
+        all_feat = torch.cat([old_feat, new_feat], dim=0)
+        all_vis_feat = torch.cat([old_vis_feat, new_vis_feat], dim=0)
+        self.xy = nn.Parameter(all_xy, requires_grad=True)
+        self.scale = nn.Parameter(all_scale, requires_grad=True)
+        self.rot = nn.Parameter(all_rot, requires_grad=True)
+        self.feat = nn.Parameter(all_feat, requires_grad=True)
+        self.vis_feat = nn.Parameter(all_vis_feat, requires_grad=False)
+        # Plot Gaussians
+        if plot_gaussians:
+            path = f"{self.train_dir}/add-gaussian_step-{self.step:d}_num-{self.num_gaussians:d}_res-{self.img_h:d}x{self.img_w:d}"
+            every_n = max(1, self.total_num_gaussians // 2000)
+            size = (self.img_h * self.img_w) / 1e4
+            visualize_added_gaussians(
+                path,
+                raw_images,
+                old_xy,
+                new_xy,
+                self.input_channels,
+                size=size,
+                every_n=every_n,
+                alpha=0.8,
+                gamma=self.gamma,
+            )
+        # Update optimizer
+        self.optimizer = torch.optim.Adam(
+            [
+                {"params": self.xy, "lr": self.pos_lr},
+                {"params": self.scale, "lr": self.scale_lr},
+                {"params": self.rot, "lr": self.rot_lr},
+                {"params": self.feat, "lr": self.feat_lr},
+            ]
+        )
+        self.worklog.info(
+            f"Step: {self.step:d} | Adding {add_num:d} Gaussians ({self.num_gaussians - add_num:d} -> {self.num_gaussians:d})"
+        )
+        self.worklog.info("***********************************************")

pyproject.toml ADDED Viewed

	@@ -0,0 +1,46 @@

+[project]
+name = "image-gs"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.10"
+dependencies = [
+    "lpips>=0.1.4",
+    "matplotlib>=3.10.6",
+    "numpy>=2.2.6",
+    "pytorch-msssim>=1.0.0",
+    "scikit-image>=0.25.2",
+    "scipy>=1.15.3",
+    "torch>=2.6.0",
+    "torchmetrics>=1.8.2",
+    "torchvision>=0.21.0",
+    "fused_ssim",
+    "pyyaml>=6.0.2",
+    "gsplat",
+    "gradio>=4.0.0",
+    "huggingface_hub>=0.24.0",
+]
+# We use python 3.10 and cu124
+[tool.uv.sources]
+fused_ssim = { git = "https://github.com/rahul-goel/fused-ssim/" }
+torch = [
+    { index = "pytorch-cu124", marker = "sys_platform == 'linux'" },
+]
+torchvision = [
+    { index = "pytorch-cu124", marker = "sys_platform == 'linux'" },
+]
+[tool.uv.extra-build-dependencies]
+fused-ssim = ["torch", "numpy"]
+[[tool.uv.index]]
+name = "pytorch-cu124"
+url = "https://download.pytorch.org/whl/cu124"
+explicit = true
+[dependency-groups]
+dev = [
+    "huggingface-hub[cli]>=0.34.4",
+]

utils/__init__.py ADDED Viewed

File without changes

utils/flip.py ADDED Viewed

	@@ -0,0 +1,811 @@

+"""FLIP metric functions"""
+#################################################################################
+# Copyright (c) 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+# 1. Redistributions of source code must retain the above copyright notice, this
+# list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# 3. Neither the name of the copyright holder nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+# SPDX-FileCopyrightText: Copyright (c) 2020-2024 NVIDIA CORPORATION & AFFILIATES
+# SPDX-License-Identifier: BSD-3-Clause
+#################################################################################
+# Visualizing and Communicating Errors in Rendered Images
+# Ray Tracing Gems II, 2021,
+# by Pontus Andersson, Jim Nilsson, and Tomas Akenine-Moller.
+# Pointer to the chapter: https://research.nvidia.com/publication/2021-08_Visualizing-and-Communicating.
+# Visualizing Errors in Rendered High Dynamic Range Images
+# Eurographics 2021,
+# by Pontus Andersson, Jim Nilsson, Peter Shirley, and Tomas Akenine-Moller.
+# Pointer to the paper: https://research.nvidia.com/publication/2021-05_HDR-FLIP.
+# FLIP: A Difference Evaluator for Alternating Images
+# High Performance Graphics 2020,
+# by Pontus Andersson, Jim Nilsson, Tomas Akenine-Moller,
+# Magnus Oskarsson, Kalle Astrom, and Mark D. Fairchild.
+# Pointer to the paper: https://research.nvidia.com/publication/2020-07_FLIP.
+# Code by Pontus Ebelin (formerly Andersson), Jim Nilsson, and Tomas Akenine-Moller.
+import sys
+import numpy as np
+import torch
+import torch.nn as nn
+class HDRFLIPLoss(nn.Module):
+    """Class for computing HDR-FLIP"""
+    def __init__(self):
+        """Init"""
+        super().__init__()
+        self.qc = 0.7
+        self.qf = 0.5
+        self.pc = 0.4
+        self.pt = 0.95
+        self.tmax = 0.85
+        self.tmin = 0.85
+        self.eps = 1e-15
+    def forward(
+        self,
+        test,
+        reference,
+        pixels_per_degree=(0.7 * 3840 / 0.7) * np.pi / 180,
+        tone_mapper="aces",
+        start_exposure=None,
+        stop_exposure=None,
+    ):
+        """
+        Computes the HDR-FLIP error map between two HDR images,
+        assuming the images are observed at a certain number of
+        pixels per degree of visual angle
+        :param test: test tensor (with NxCxHxW layout with nonnegative values)
+        :param reference: reference tensor (with NxCxHxW layout with nonnegative values)
+        :param pixels_per_degree: float describing the number of pixels per degree of visual angle of the observer,
+                                                          default corresponds to viewing the images on a 0.7 meters wide 4K monitor at 0.7 meters from the display
+        :param tone_mapper: (optional) string describing what tone mapper HDR-FLIP should assume
+        :param start_exposure: (optional tensor (with Nx1x1x1 layout) with start exposures corresponding to each HDR reference/test pair
+        :param stop_exposure: (optional) tensor (with Nx1x1x1 layout) with stop exposures corresponding to each HDR reference/test pair
+        :return: float containing the mean FLIP error (in the range [0,1]) between the HDR reference and test images in the batch
+        """
+        # HDR-FLIP expects nonnegative and non-NaN values in the input
+        reference = torch.clamp(reference, 0, 65536.0)
+        test = torch.clamp(test, 0, 65536.0)
+        # Compute start and stop exposures, if they are not given
+        if start_exposure is None or stop_exposure is None:
+            c_start, c_stop = compute_start_stop_exposures(
+                reference, tone_mapper, self.tmax, self.tmin
+            )
+            if start_exposure is None:
+                start_exposure = c_start
+            if stop_exposure is None:
+                stop_exposure = c_stop
+        # Compute number of exposures
+        num_exposures = torch.max(
+            torch.tensor([2.0], requires_grad=False).cuda(),
+            torch.ceil(stop_exposure - start_exposure),
+        )
+        most_exposures = int(torch.amax(num_exposures, dim=0).item())
+        # Compute exposure step size
+        step_size = (stop_exposure - start_exposure) / torch.max(
+            num_exposures - 1, torch.tensor([1.0], requires_grad=False).cuda()
+        )
+        # Set the depth of the error tensor to the number of exposures given by the largest exposure range any reference image yielded.
+        # This allows us to do one loop for each image in our batch, while not affecting the HDR-FLIP error, as we fill up the error tensor with 0s.
+        # Note that the step size still depends on num_exposures and is therefore independent of most_exposures
+        dim = reference.size()
+        all_errors = torch.zeros(size=(dim[0], most_exposures, dim[2], dim[3])).cuda()
+        # Loop over exposures and compute LDR-FLIP for each pair of LDR reference and test
+        for i in range(0, most_exposures):
+            exposure = start_exposure + i * step_size
+            reference_tone_mapped = tone_map(reference, tone_mapper, exposure)
+            test_tone_mapped = tone_map(test, tone_mapper, exposure)
+            reference_opponent = color_space_transform(
+                reference_tone_mapped, "linrgb2ycxcz"
+            )
+            test_opponent = color_space_transform(test_tone_mapped, "linrgb2ycxcz")
+            all_errors[:, i, :, :] = compute_ldrflip(
+                test_opponent,
+                reference_opponent,
+                pixels_per_degree,
+                self.qc,
+                self.qf,
+                self.pc,
+                self.pt,
+                self.eps,
+            ).squeeze(1)
+        # Take per-pixel maximum over all LDR-FLIP errors to get HDR-FLIP
+        hdrflip_error = torch.amax(all_errors, dim=1, keepdim=True)
+        return torch.mean(hdrflip_error)
+class LDRFLIPLoss(nn.Module):
+    """Class for computing LDR FLIP loss"""
+    def __init__(self):
+        """Init"""
+        super().__init__()
+        self.qc = 0.7
+        self.qf = 0.5
+        self.pc = 0.4
+        self.pt = 0.95
+        self.eps = 1e-15
+    def forward(
+        self, test, reference, pixels_per_degree=(0.7 * 3840 / 0.7) * np.pi / 180
+    ):
+        """
+        Computes the LDR-FLIP error map between two LDR images,
+        assuming the images are observed at a certain number of
+        pixels per degree of visual angle
+        :param test: test tensor (with NxCxHxW layout with values in the range [0, 1] in the sRGB color space)
+        :param reference: reference tensor (with NxCxHxW layout with values in the range [0, 1] in the sRGB color space)
+        :param pixels_per_degree: float describing the number of pixels per degree of visual angle of the observer,
+                                                          default corresponds to viewing the images on a 0.7 meters wide 4K monitor at 0.7 meters from the display
+        :return: float containing the mean FLIP error (in the range [0,1]) between the LDR reference and test images in the batch
+        """
+        # LDR-FLIP expects non-NaN values in [0,1] as input
+        reference = torch.clamp(reference, 0, 1)
+        test = torch.clamp(test, 0, 1)
+        # Transform reference and test to opponent color space
+        reference_opponent = color_space_transform(reference, "srgb2ycxcz")
+        test_opponent = color_space_transform(test, "srgb2ycxcz")
+        deltaE = compute_ldrflip(
+            test_opponent,
+            reference_opponent,
+            pixels_per_degree,
+            self.qc,
+            self.qf,
+            self.pc,
+            self.pt,
+            self.eps,
+        )
+        return torch.mean(deltaE)
+def compute_ldrflip(test, reference, pixels_per_degree, qc, qf, pc, pt, eps):
+    """
+    Computes the LDR-FLIP error map between two LDR images,
+    assuming the images are observed at a certain number of
+    pixels per degree of visual angle
+    :param reference: reference tensor (with NxCxHxW layout with values in the YCxCz color space)
+    :param test: test tensor (with NxCxHxW layout with values in the YCxCz color space)
+    :param pixels_per_degree: float describing the number of pixels per degree of visual angle of the observer,
+                                                      default corresponds to viewing the images on a 0.7 meters wide 4K monitor at 0.7 meters from the display
+    :param qc: float describing the q_c exponent in the LDR-FLIP color pipeline (see FLIP paper for details)
+    :param qf: float describing the q_f exponent in the LDR-FLIP feature pipeline (see FLIP paper for details)
+    :param pc: float describing the p_c exponent in the LDR-FLIP color pipeline (see FLIP paper for details)
+    :param pt: float describing the p_t exponent in the LDR-FLIP color pipeline (see FLIP paper for details)
+    :param eps: float containing a small value used to improve training stability
+    :return: tensor containing the per-pixel FLIP errors (with Nx1xHxW layout and values in the range [0, 1]) between LDR reference and test images
+    """
+    # --- Color pipeline ---
+    # Spatial filtering
+    s_a, radius_a = generate_spatial_filter(pixels_per_degree, "A")
+    s_rg, radius_rg = generate_spatial_filter(pixels_per_degree, "RG")
+    s_by, radius_by = generate_spatial_filter(pixels_per_degree, "BY")
+    radius = max(radius_a, radius_rg, radius_by)
+    filtered_reference = spatial_filter(reference, s_a, s_rg, s_by, radius)
+    filtered_test = spatial_filter(test, s_a, s_rg, s_by, radius)
+    # Perceptually Uniform Color Space
+    preprocessed_reference = hunt_adjustment(
+        color_space_transform(filtered_reference, "linrgb2lab")
+    )
+    preprocessed_test = hunt_adjustment(
+        color_space_transform(filtered_test, "linrgb2lab")
+    )
+    # Color metric
+    deltaE_hyab = hyab(preprocessed_reference, preprocessed_test, eps)
+    power_deltaE_hyab = torch.pow(deltaE_hyab, qc)
+    hunt_adjusted_green = hunt_adjustment(
+        color_space_transform(
+            torch.tensor([[[0.0]], [[1.0]], [[0.0]]]).unsqueeze(0), "linrgb2lab"
+        )
+    )
+    hunt_adjusted_blue = hunt_adjustment(
+        color_space_transform(
+            torch.tensor([[[0.0]], [[0.0]], [[1.0]]]).unsqueeze(0), "linrgb2lab"
+        )
+    )
+    cmax = torch.pow(hyab(hunt_adjusted_green, hunt_adjusted_blue, eps), qc).item()
+    deltaE_c = redistribute_errors(power_deltaE_hyab, cmax, pc, pt)
+    # --- Feature pipeline ---
+    # Extract and normalize Yy component
+    ref_y = (reference[:, 0:1, :, :] + 16) / 116
+    test_y = (test[:, 0:1, :, :] + 16) / 116
+    # Edge and point detection
+    edges_reference = feature_detection(ref_y, pixels_per_degree, "edge")
+    points_reference = feature_detection(ref_y, pixels_per_degree, "point")
+    edges_test = feature_detection(test_y, pixels_per_degree, "edge")
+    points_test = feature_detection(test_y, pixels_per_degree, "point")
+    # Feature metric
+    deltaE_f = torch.max(
+        torch.abs(
+            torch.norm(edges_reference, dim=1, keepdim=True)
+            - torch.norm(edges_test, dim=1, keepdim=True)
+        ),
+        torch.abs(
+            torch.norm(points_test, dim=1, keepdim=True)
+            - torch.norm(points_reference, dim=1, keepdim=True)
+        ),
+    )
+    deltaE_f = torch.clamp(deltaE_f, min=eps)  # clamp to stabilize training
+    deltaE_f = torch.pow(((1 / np.sqrt(2)) * deltaE_f), qf)
+    # --- Final error ---
+    return torch.pow(deltaE_c, 1 - deltaE_f)
+def tone_map(img, tone_mapper, exposure):
+    """
+    Applies exposure compensation and tone mapping.
+    Refer to the Visualizing Errors in Rendered High Dynamic Range Images
+    paper for details about the formulas.
+    :param img: float tensor (with NxCxHxW layout) containing nonnegative values
+    :param tone_mapper: string describing the tone mapper to apply
+    :param exposure: float tensor (with Nx1x1x1 layout) describing the exposure compensation factor
+    """
+    # Exposure compensation
+    x = (2**exposure) * img
+    # Set tone mapping coefficients depending on tone_mapper
+    if tone_mapper == "reinhard":
+        lum_coeff_r = 0.2126
+        lum_coeff_g = 0.7152
+        lum_coeff_b = 0.0722
+        Y = (
+            x[:, 0:1, :, :] * lum_coeff_r
+            + x[:, 1:2, :, :] * lum_coeff_g
+            + x[:, 2:3, :, :] * lum_coeff_b
+        )
+        return torch.clamp(torch.div(x, 1 + Y), 0.0, 1.0)
+    if tone_mapper == "hable":
+        # Source: https://64.github.io/tonemapping/
+        A = 0.15
+        B = 0.50
+        C = 0.10
+        D = 0.20
+        E = 0.02
+        F = 0.30
+        k0 = A * F - A * E
+        k1 = C * B * F - B * E
+        k2 = 0
+        k3 = A * F
+        k4 = B * F
+        k5 = D * F * F
+        W = 11.2
+        nom = k0 * torch.pow(W, torch.tensor([2.0]).cuda()) + k1 * W + k2
+        denom = k3 * torch.pow(W, torch.tensor([2.0]).cuda()) + k4 * W + k5
+        white_scale = torch.div(denom, nom)  # = 1 / (nom / denom)
+        # Include white scale and exposure bias in rational polynomial coefficients
+        k0 = 4 * k0 * white_scale
+        k1 = 2 * k1 * white_scale
+        k2 = k2 * white_scale
+        k3 = 4 * k3
+        k4 = 2 * k4
+        # k5 = k5 # k5 is not changed
+    else:
+        # Source:  ACES approximation: https://knarkowicz.wordpress.com/2016/01/06/aces-filmic-tone-mapping-curve/
+        # Include pre-exposure cancelation in constants
+        k0 = 0.6 * 0.6 * 2.51
+        k1 = 0.6 * 0.03
+        k2 = 0
+        k3 = 0.6 * 0.6 * 2.43
+        k4 = 0.6 * 0.59
+        k5 = 0.14
+    x2 = torch.pow(x, 2)
+    nom = k0 * x2 + k1 * x + k2
+    denom = k3 * x2 + k4 * x + k5
+    denom = torch.where(
+        torch.isinf(denom), torch.Tensor([1.0]).cuda(), denom
+    )  # if denom is inf, then so is nom => nan. Pixel is very bright. It becomes inf here, but 1 after clamp below
+    y = torch.div(nom, denom)
+    return torch.clamp(y, 0.0, 1.0)
+def compute_start_stop_exposures(reference, tone_mapper, tmax, tmin):
+    """
+    Computes start and stop exposure for HDR-FLIP based on given tone mapper and reference image.
+    Refer to the Visualizing Errors in Rendered High Dynamic Range Images
+    paper for details about the formulas
+    :param reference: float tensor (with NxCxHxW layout) containing reference images (nonnegative values)
+    :param tone_mapper: string describing which tone mapper should be assumed
+    :param tmax: float describing the t value used to find the start exposure
+    :param tmin: float describing the t value used to find the stop exposure
+    :return: two float tensors (with Nx1x1x1 layout) containing start and stop exposures, respectively, to use for HDR-FLIP
+    """
+    if tone_mapper == "reinhard":
+        k0 = 0
+        k1 = 1
+        k2 = 0
+        k3 = 0
+        k4 = 1
+        k5 = 1
+        x_max = tmax * k5 / (k1 - tmax * k4)
+        x_min = tmin * k5 / (k1 - tmin * k4)
+    elif tone_mapper == "hable":
+        # Source: https://64.github.io/tonemapping/
+        A = 0.15
+        B = 0.50
+        C = 0.10
+        D = 0.20
+        E = 0.02
+        F = 0.30
+        k0 = A * F - A * E
+        k1 = C * B * F - B * E
+        k2 = 0
+        k3 = A * F
+        k4 = B * F
+        k5 = D * F * F
+        W = 11.2
+        nom = k0 * torch.pow(W, torch.tensor([2.0]).cuda()) + k1 * W + k2
+        denom = k3 * torch.pow(W, torch.tensor([2.0]).cuda()) + k4 * W + k5
+        white_scale = torch.div(denom, nom)  # = 1 / (nom / denom)
+        # Include white scale and exposure bias in rational polynomial coefficients
+        k0 = 4 * k0 * white_scale
+        k1 = 2 * k1 * white_scale
+        k2 = k2 * white_scale
+        k3 = 4 * k3
+        k4 = 2 * k4
+        # k5 = k5 # k5 is not changed
+        c0 = (k1 - k4 * tmax) / (k0 - k3 * tmax)
+        c1 = (k2 - k5 * tmax) / (k0 - k3 * tmax)
+        x_max = -0.5 * c0 + torch.sqrt(((torch.tensor([0.5]).cuda() * c0) ** 2) - c1)
+        c0 = (k1 - k4 * tmin) / (k0 - k3 * tmin)
+        c1 = (k2 - k5 * tmin) / (k0 - k3 * tmin)
+        x_min = -0.5 * c0 + torch.sqrt(((torch.tensor([0.5]).cuda() * c0) ** 2) - c1)
+    else:
+        # Source:  ACES approximation: https://knarkowicz.wordpress.com/2016/01/06/aces-filmic-tone-mapping-curve/
+        # Include pre-exposure cancelation in constants
+        k0 = 0.6 * 0.6 * 2.51
+        k1 = 0.6 * 0.03
+        k2 = 0
+        k3 = 0.6 * 0.6 * 2.43
+        k4 = 0.6 * 0.59
+        k5 = 0.14
+        c0 = (k1 - k4 * tmax) / (k0 - k3 * tmax)
+        c1 = (k2 - k5 * tmax) / (k0 - k3 * tmax)
+        x_max = -0.5 * c0 + torch.sqrt(((torch.tensor([0.5]).cuda() * c0) ** 2) - c1)
+        c0 = (k1 - k4 * tmin) / (k0 - k3 * tmin)
+        c1 = (k2 - k5 * tmin) / (k0 - k3 * tmin)
+        x_min = -0.5 * c0 + torch.sqrt(((torch.tensor([0.5]).cuda() * c0) ** 2) - c1)
+    # Convert reference to luminance
+    lum_coeff_r = 0.2126
+    lum_coeff_g = 0.7152
+    lum_coeff_b = 0.0722
+    Y_reference = (
+        reference[:, 0:1, :, :] * lum_coeff_r
+        + reference[:, 1:2, :, :] * lum_coeff_g
+        + reference[:, 2:3, :, :] * lum_coeff_b
+    )
+    # Compute start exposure
+    Y_hi = torch.amax(Y_reference, dim=(2, 3), keepdim=True)
+    start_exposure = torch.log2(x_max / Y_hi)
+    # Compute stop exposure
+    dim = Y_reference.size()
+    Y_ref = Y_reference.view(dim[0], dim[1], dim[2] * dim[3])
+    Y_lo = torch.median(Y_ref, dim=2).values.unsqueeze(2).unsqueeze(3)
+    stop_exposure = torch.log2(x_min / Y_lo)
+    return start_exposure, stop_exposure
+def generate_spatial_filter(pixels_per_degree, channel):
+    """
+    Generates spatial contrast sensitivity filters with width depending on
+    the number of pixels per degree of visual angle of the observer
+    :param pixels_per_degree: float indicating number of pixels per degree of visual angle
+    :param channel: string describing what filter should be generated
+    :yield: Filter kernel corresponding to the spatial contrast sensitivity function of the given channel and kernel's radius
+    """
+    a1_A = 1
+    b1_A = 0.0047
+    a2_A = 0
+    b2_A = 1e-5  # avoid division by 0
+    a1_rg = 1
+    b1_rg = 0.0053
+    a2_rg = 0
+    b2_rg = 1e-5  # avoid division by 0
+    a1_by = 34.1
+    b1_by = 0.04
+    a2_by = 13.5
+    b2_by = 0.025
+    if channel == "A":  # Achromatic CSF
+        a1 = a1_A
+        b1 = b1_A
+        a2 = a2_A
+        b2 = b2_A
+    elif channel == "RG":  # Red-Green CSF
+        a1 = a1_rg
+        b1 = b1_rg
+        a2 = a2_rg
+        b2 = b2_rg
+    elif channel == "BY":  # Blue-Yellow CSF
+        a1 = a1_by
+        b1 = b1_by
+        a2 = a2_by
+        b2 = b2_by
+    # Determine evaluation domain
+    max_scale_parameter = max([b1_A, b2_A, b1_rg, b2_rg, b1_by, b2_by])
+    r = np.ceil(3 * np.sqrt(max_scale_parameter / (2 * np.pi**2)) * pixels_per_degree)
+    r = int(r)
+    deltaX = 1.0 / pixels_per_degree
+    x, y = np.meshgrid(range(-r, r + 1), range(-r, r + 1))
+    z = (x * deltaX) ** 2 + (y * deltaX) ** 2
+    # Generate weights
+    g = a1 * np.sqrt(np.pi / b1) * np.exp(-(np.pi**2) * z / b1) + a2 * np.sqrt(
+        np.pi / b2
+    ) * np.exp(-(np.pi**2) * z / b2)
+    g = g / np.sum(g)
+    g = torch.Tensor(g).unsqueeze(0).unsqueeze(0).cuda()
+    return g, r
+def spatial_filter(img, s_a, s_rg, s_by, radius):
+    """
+    Filters an image with channel specific spatial contrast sensitivity functions
+    and clips result to the unit cube in linear RGB
+    :param img: image tensor to filter (with NxCxHxW layout in the YCxCz color space)
+    :param s_a: spatial filter matrix for the achromatic channel
+    :param s_rg: spatial filter matrix for the red-green channel
+    :param s_by: spatial filter matrix for the blue-yellow channel
+    :return: input image (with NxCxHxW layout) transformed to linear RGB after filtering with spatial contrast sensitivity functions
+    """
+    dim = img.size()
+    # Prepare image for convolution
+    img_pad = torch.zeros(
+        (dim[0], dim[1], dim[2] + 2 * radius, dim[3] + 2 * radius), device="cuda"
+    )
+    img_pad[:, 0:1, :, :] = nn.functional.pad(
+        img[:, 0:1, :, :], (radius, radius, radius, radius), mode="replicate"
+    )
+    img_pad[:, 1:2, :, :] = nn.functional.pad(
+        img[:, 1:2, :, :], (radius, radius, radius, radius), mode="replicate"
+    )
+    img_pad[:, 2:3, :, :] = nn.functional.pad(
+        img[:, 2:3, :, :], (radius, radius, radius, radius), mode="replicate"
+    )
+    # Apply Gaussian filters
+    img_tilde_opponent = torch.zeros((dim[0], dim[1], dim[2], dim[3]), device="cuda")
+    img_tilde_opponent[:, 0:1, :, :] = nn.functional.conv2d(
+        img_pad[:, 0:1, :, :], s_a.cuda(), padding=0
+    )
+    img_tilde_opponent[:, 1:2, :, :] = nn.functional.conv2d(
+        img_pad[:, 1:2, :, :], s_rg.cuda(), padding=0
+    )
+    img_tilde_opponent[:, 2:3, :, :] = nn.functional.conv2d(
+        img_pad[:, 2:3, :, :], s_by.cuda(), padding=0
+    )
+    # Transform to linear RGB for clamp
+    img_tilde_linear_rgb = color_space_transform(img_tilde_opponent, "ycxcz2linrgb")
+    # Clamp to RGB box
+    return torch.clamp(img_tilde_linear_rgb, 0.0, 1.0)
+def hunt_adjustment(img):
+    """
+    Applies Hunt-adjustment to an image
+    :param img: image tensor to adjust (with NxCxHxW layout in the L*a*b* color space)
+    :return: Hunt-adjusted image tensor (with NxCxHxW layout in the Hunt-adjusted L*A*B* color space)
+    """
+    # Extract luminance component
+    L = img[:, 0:1, :, :]
+    # Apply Hunt adjustment
+    img_h = torch.zeros(img.size(), device="cuda")
+    img_h[:, 0:1, :, :] = L
+    img_h[:, 1:2, :, :] = torch.mul((0.01 * L), img[:, 1:2, :, :])
+    img_h[:, 2:3, :, :] = torch.mul((0.01 * L), img[:, 2:3, :, :])
+    return img_h
+def hyab(reference, test, eps):
+    """
+    Computes the HyAB distance between reference and test images
+    :param reference: reference image tensor (with NxCxHxW layout in the standard or Hunt-adjusted L*A*B* color space)
+    :param test: test image tensor (with NxCxHxW layout in the standard or Hunt-adjusted L*a*b* color space)
+    :param eps: float containing a small value used to improve training stability
+    :return: image tensor (with Nx1xHxW layout) containing the per-pixel HyAB distances between reference and test images
+    """
+    delta = reference - test
+    root = torch.sqrt(torch.clamp(torch.pow(delta[:, 0:1, :, :], 2), min=eps))
+    delta_norm = torch.norm(delta[:, 1:3, :, :], dim=1, keepdim=True)
+    return root + delta_norm  # alternative abs to stabilize training
+def redistribute_errors(power_deltaE_hyab, cmax, pc, pt):
+    """
+    Redistributes exponentiated HyAB errors to the [0,1] range
+    :param power_deltaE_hyab: float tensor (with Nx1xHxW layout) containing the exponentiated HyAb distance
+    :param cmax: float containing the exponentiated, maximum HyAB difference between two colors in Hunt-adjusted L*A*B* space
+    :param pc: float containing the cmax multiplier p_c (see FLIP paper)
+    :param pt: float containing the target value, p_t, for p_c * cmax (see FLIP paper)
+    :return: image tensor (with Nx1xHxW layout) containing redistributed per-pixel HyAB distances (in range [0,1])
+    """
+    # Re-map error to 0-1 range. Values between 0 and
+    # pccmax are mapped to the range [0, pt],
+    # while the rest are mapped to the range (pt, 1]
+    deltaE_c = torch.zeros(power_deltaE_hyab.size(), device="cuda")
+    pccmax = pc * cmax
+    deltaE_c = torch.where(
+        power_deltaE_hyab < pccmax,
+        (pt / pccmax) * power_deltaE_hyab,
+        pt + ((power_deltaE_hyab - pccmax) / (cmax - pccmax)) * (1.0 - pt),
+    )
+    return deltaE_c
+def feature_detection(img_y, pixels_per_degree, feature_type):
+    """
+    Detects edges and points (features) in the achromatic image
+    :param imgy: achromatic image tensor (with Nx1xHxW layout, containing normalized Y-values from YCxCz)
+    :param pixels_per_degree: float describing the number of pixels per degree of visual angle of the observer
+    :param feature_type: string indicating the type of feature to detect
+    :return: image tensor (with Nx2xHxW layout, with values in range [0,1]) containing large values where features were detected
+    """
+    # Set peak to trough value (2x standard deviations) of human edge
+    # detection filter
+    w = 0.082
+    # Compute filter radius
+    sd = 0.5 * w * pixels_per_degree
+    radius = int(np.ceil(3 * sd))
+    # Compute 2D Gaussian
+    [x, y] = np.meshgrid(range(-radius, radius + 1), range(-radius, radius + 1))
+    g = np.exp(-(x**2 + y**2) / (2 * sd * sd))
+    if feature_type == "edge":  # Edge detector
+        # Compute partial derivative in x-direction
+        Gx = np.multiply(-x, g)
+    else:  # Point detector
+        # Compute second partial derivative in x-direction
+        Gx = np.multiply(x**2 / (sd * sd) - 1, g)
+    # Normalize positive weights to sum to 1 and negative weights to sum to -1
+    negative_weights_sum = -np.sum(Gx[Gx < 0])
+    positive_weights_sum = np.sum(Gx[Gx > 0])
+    Gx = torch.Tensor(Gx)
+    Gx = torch.where(Gx < 0, Gx / negative_weights_sum, Gx / positive_weights_sum)
+    Gx = Gx.unsqueeze(0).unsqueeze(0).cuda()
+    # Detect features
+    featuresX = nn.functional.conv2d(
+        nn.functional.pad(img_y, (radius, radius, radius, radius), mode="replicate"),
+        Gx,
+        padding=0,
+    )
+    featuresY = nn.functional.conv2d(
+        nn.functional.pad(img_y, (radius, radius, radius, radius), mode="replicate"),
+        torch.transpose(Gx, 2, 3),
+        padding=0,
+    )
+    return torch.cat((featuresX, featuresY), dim=1)
+def color_space_transform(input_color, fromSpace2toSpace):
+    """
+    Transforms inputs between different color spaces
+    :param input_color: tensor of colors to transform (with NxCxHxW layout)
+    :param fromSpace2toSpace: string describing transform
+    :return: transformed tensor (with NxCxHxW layout)
+    """
+    dim = input_color.size()
+    # Assume D65 standard illuminant
+    reference_illuminant = torch.tensor(
+        [[[0.950428545]], [[1.000000000]], [[1.088900371]]]
+    ).cuda()
+    inv_reference_illuminant = torch.tensor(
+        [[[1.052156925]], [[1.000000000]], [[0.918357670]]]
+    ).cuda()
+    if fromSpace2toSpace == "srgb2linrgb":
+        limit = 0.04045
+        transformed_color = torch.where(
+            input_color > limit,
+            torch.pow((torch.clamp(input_color, min=limit) + 0.055) / 1.055, 2.4),
+            input_color / 12.92,
+        )  # clamp to stabilize training
+    elif fromSpace2toSpace == "linrgb2srgb":
+        limit = 0.0031308
+        transformed_color = torch.where(
+            input_color > limit,
+            1.055 * torch.pow(torch.clamp(input_color, min=limit), (1.0 / 2.4)) - 0.055,
+            12.92 * input_color,
+        )
+    elif fromSpace2toSpace in ["linrgb2xyz", "xyz2linrgb"]:
+        # Source: https://www.image-engineering.de/library/technotes/958-how-to-convert-between-srgb-and-ciexyz
+        # Assumes D65 standard illuminant
+        if fromSpace2toSpace == "linrgb2xyz":
+            a11 = 10135552 / 24577794
+            a12 = 8788810 / 24577794
+            a13 = 4435075 / 24577794
+            a21 = 2613072 / 12288897
+            a22 = 8788810 / 12288897
+            a23 = 887015 / 12288897
+            a31 = 1425312 / 73733382
+            a32 = 8788810 / 73733382
+            a33 = 70074185 / 73733382
+        else:
+            # Constants found by taking the inverse of the matrix
+            # defined by the constants for linrgb2xyz
+            a11 = 3.241003275
+            a12 = -1.537398934
+            a13 = -0.498615861
+            a21 = -0.969224334
+            a22 = 1.875930071
+            a23 = 0.041554224
+            a31 = 0.055639423
+            a32 = -0.204011202
+            a33 = 1.057148933
+        A = torch.Tensor([[a11, a12, a13], [a21, a22, a23], [a31, a32, a33]])
+        input_color = input_color.view(dim[0], dim[1], dim[2] * dim[3]).cuda()  # NC(HW)
+        transformed_color = torch.matmul(A.cuda(), input_color)
+        transformed_color = transformed_color.view(dim[0], dim[1], dim[2], dim[3])
+    elif fromSpace2toSpace == "xyz2ycxcz":
+        input_color = torch.mul(input_color, inv_reference_illuminant)
+        y = 116 * input_color[:, 1:2, :, :] - 16
+        cx = 500 * (input_color[:, 0:1, :, :] - input_color[:, 1:2, :, :])
+        cz = 200 * (input_color[:, 1:2, :, :] - input_color[:, 2:3, :, :])
+        transformed_color = torch.cat((y, cx, cz), 1)
+    elif fromSpace2toSpace == "ycxcz2xyz":
+        y = (input_color[:, 0:1, :, :] + 16) / 116
+        cx = input_color[:, 1:2, :, :] / 500
+        cz = input_color[:, 2:3, :, :] / 200
+        x = y + cx
+        z = y - cz
+        transformed_color = torch.cat((x, y, z), 1)
+        transformed_color = torch.mul(transformed_color, reference_illuminant)
+    elif fromSpace2toSpace == "xyz2lab":
+        input_color = torch.mul(input_color, inv_reference_illuminant)
+        delta = 6 / 29
+        delta_square = delta * delta
+        delta_cube = delta * delta_square
+        factor = 1 / (3 * delta_square)
+        clamped_term = torch.pow(
+            torch.clamp(input_color, min=delta_cube), 1.0 / 3.0
+        ).to(dtype=input_color.dtype)
+        div = (factor * input_color + (4 / 29)).to(dtype=input_color.dtype)
+        input_color = torch.where(
+            input_color > delta_cube, clamped_term, div
+        )  # clamp to stabilize training
+        L = 116 * input_color[:, 1:2, :, :] - 16
+        a = 500 * (input_color[:, 0:1, :, :] - input_color[:, 1:2, :, :])
+        b = 200 * (input_color[:, 1:2, :, :] - input_color[:, 2:3, :, :])
+        transformed_color = torch.cat((L, a, b), 1)
+    elif fromSpace2toSpace == "lab2xyz":
+        y = (input_color[:, 0:1, :, :] + 16) / 116
+        a = input_color[:, 1:2, :, :] / 500
+        b = input_color[:, 2:3, :, :] / 200
+        x = y + a
+        z = y - b
+        xyz = torch.cat((x, y, z), 1)
+        delta = 6 / 29
+        delta_square = delta * delta
+        factor = 3 * delta_square
+        xyz = torch.where(xyz > delta, torch.pow(xyz, 3), factor * (xyz - 4 / 29))
+        transformed_color = torch.mul(xyz, reference_illuminant)
+    elif fromSpace2toSpace == "srgb2xyz":
+        transformed_color = color_space_transform(input_color, "srgb2linrgb")
+        transformed_color = color_space_transform(transformed_color, "linrgb2xyz")
+    elif fromSpace2toSpace == "srgb2ycxcz":
+        transformed_color = color_space_transform(input_color, "srgb2linrgb")
+        transformed_color = color_space_transform(transformed_color, "linrgb2xyz")
+        transformed_color = color_space_transform(transformed_color, "xyz2ycxcz")
+    elif fromSpace2toSpace == "linrgb2ycxcz":
+        transformed_color = color_space_transform(input_color, "linrgb2xyz")
+        transformed_color = color_space_transform(transformed_color, "xyz2ycxcz")
+    elif fromSpace2toSpace == "srgb2lab":
+        transformed_color = color_space_transform(input_color, "srgb2linrgb")
+        transformed_color = color_space_transform(transformed_color, "linrgb2xyz")
+        transformed_color = color_space_transform(transformed_color, "xyz2lab")
+    elif fromSpace2toSpace == "linrgb2lab":
+        transformed_color = color_space_transform(input_color, "linrgb2xyz")
+        transformed_color = color_space_transform(transformed_color, "xyz2lab")
+    elif fromSpace2toSpace == "ycxcz2linrgb":
+        transformed_color = color_space_transform(input_color, "ycxcz2xyz")
+        transformed_color = color_space_transform(transformed_color, "xyz2linrgb")
+    elif fromSpace2toSpace == "lab2srgb":
+        transformed_color = color_space_transform(input_color, "lab2xyz")
+        transformed_color = color_space_transform(transformed_color, "xyz2linrgb")
+        transformed_color = color_space_transform(transformed_color, "linrgb2srgb")
+    elif fromSpace2toSpace == "ycxcz2lab":
+        transformed_color = color_space_transform(input_color, "ycxcz2xyz")
+        transformed_color = color_space_transform(transformed_color, "xyz2lab")
+    else:
+        sys.exit("Error: The color transform %s is not defined!" % fromSpace2toSpace)
+    return transformed_color

utils/image_utils.py ADDED Viewed

	@@ -0,0 +1,253 @@

+import os
+import matplotlib
+import matplotlib.font_manager as font_manager
+import matplotlib.pyplot as plt
+import numpy as np
+import torch
+from matplotlib.patches import Ellipse
+from numpy.linalg import norm
+from PIL import Image
+from scipy.ndimage import sobel
+FONT_PATH = "assets/fonts/linux_libertine/LinLibertine_R.ttf"
+# Make font loading optional for deployment environments
+try:
+    font_manager.fontManager.addfont(FONT_PATH)
+    FONT_PROP = font_manager.FontProperties(fname=FONT_PATH).get_name()
+    plt.rcParams["font.family"] = FONT_PROP
+    plt.rcParams["text.usetex"] = True
+except (FileNotFoundError, OSError):
+    # Use default font if custom font is not available
+    FONT_PROP = "DejaVu Sans"
+    plt.rcParams["font.family"] = FONT_PROP
+    plt.rcParams["text.usetex"] = False  # Disable LaTeX if custom font unavailable
+matplotlib.rcParams["font.size"] = 16
+matplotlib.rcParams["axes.titlesize"] = 16
+matplotlib.rcParams["figure.titlesize"] = 16
+matplotlib.rcParams["legend.fontsize"] = 16
+matplotlib.rcParams["legend.title_fontsize"] = 16
+matplotlib.rcParams["xtick.labelsize"] = 14
+matplotlib.rcParams["ytick.labelsize"] = 14
+ALLOWED_IMAGE_FILE_FORMATS = [".jpeg", ".jpg", ".png"]
+ALLOWED_IMAGE_TYPES = {"RGB": 3, "RGBA": 3, "L": 1}
+PLOT_DPI = 72.0
+GAUSSIAN_ZOOM = 5
+GAUSSIAN_COLOR = "#80ed99"
+def get_psnr(image1, image2, max_value=1.0):
+    mse = torch.mean((image1 - image2) ** 2)
+    if mse.item() <= 1e-7:
+        return float("inf")
+    psnr = 20 * torch.log10(max_value / torch.sqrt(mse))
+    return psnr
+def get_grid(h, w, x_lim=np.asarray([0, 1]), y_lim=np.asarray([0, 1])):
+    x = torch.linspace(x_lim[0], x_lim[1], steps=w + 1)[:-1] + 0.5 / w
+    y = torch.linspace(y_lim[0], y_lim[1], steps=h + 1)[:-1] + 0.5 / h
+    grid_x, grid_y = torch.meshgrid(x, y, indexing="xy")
+    grid = torch.stack([grid_x, grid_y], dim=-1)
+    return grid
+def compute_image_gradients(image):
+    gy, gx = [], []
+    for image_channel in image:
+        gy.append(sobel(image_channel, 0))
+        gx.append(sobel(image_channel, 1))
+    gy = norm(np.stack(gy, axis=0), ord=2, axis=0).astype(np.float32)
+    gx = norm(np.stack(gx, axis=0), ord=2, axis=0).astype(np.float32)
+    return gy, gx
+def load_images(load_path, downsample_ratio=None, gamma=None):
+    """
+    Load target images or textures from a directory or a single file.
+    """
+    image_list = []
+    image_path_list = []
+    image_fname_list = []
+    num_channels_list = []
+    if (
+        os.path.isfile(load_path)
+        and os.path.splitext(load_path)[1].lower() in ALLOWED_IMAGE_FILE_FORMATS
+    ):
+        image_path_list.append(load_path)
+    elif os.path.isdir(load_path):
+        for file in sorted(os.listdir(load_path), key=str.lower):
+            if os.path.splitext(file)[1].lower() in ALLOWED_IMAGE_FILE_FORMATS:
+                image_path_list.append(os.path.join(load_path, file))
+    if len(image_path_list) == 0:
+        raise FileNotFoundError(f"No supported image file found at '{load_path}'")
+    for image_path in image_path_list:
+        image_fname_list.append(os.path.splitext(os.path.basename(image_path))[0])
+        image = Image.open(image_path)
+        # Warning: Only support images of type L, RGB, or RGBA in JPEG or PNG format
+        if image.mode not in ALLOWED_IMAGE_TYPES:
+            raise TypeError(
+                f"Only support images of type {list(ALLOWED_IMAGE_TYPES.keys())} in JPEG or PNG format"
+            )
+        num_channels = ALLOWED_IMAGE_TYPES[image.mode]
+        num_channels_list.append(num_channels)
+        if downsample_ratio is not None:
+            image = image.resize(
+                (
+                    round(image.width / downsample_ratio),
+                    round(image.height / downsample_ratio),
+                ),
+                resample=Image.Resampling.BILINEAR,
+            )
+        # Warning: Assume 8 bit color depth
+        image = np.asarray(image, dtype=np.float32) / 255.0
+        if gamma is not None:
+            image = np.power(image, gamma)
+        if len(image.shape) == 2:
+            image = np.expand_dims(image, axis=2)
+        image = image.transpose(2, 0, 1)
+        image = image[:num_channels]
+        image_list.append(image)
+    return np.concatenate(image_list, axis=0), num_channels_list, image_fname_list
+def to_output_format(image, gamma):
+    if len(image.shape) not in [2, 3]:
+        raise ValueError(f"Wrong image format: shape = {image.shape}")
+    if isinstance(image, torch.Tensor):
+        image = image.detach().cpu().clone().numpy()
+    if len(image.shape) == 3 and image.shape[2] not in [1, 3]:
+        image = image.transpose(1, 2, 0)
+        if image.shape[2] not in [1, 3]:
+            raise ValueError(f"Wrong image format: shape = {image.shape}")
+    if len(image.shape) == 3 and image.shape[2] == 1:
+        image = image.squeeze(axis=2)
+    image = np.clip(image, 0.0, 1.0)
+    if gamma is not None:
+        image = np.power(image, 1.0 / gamma)
+    image = (255.0 * image).astype(np.uint8)
+    return image
+def save_image(image, save_path, gamma=None, zoom=None):
+    image = to_output_format(image, gamma)
+    image = Image.fromarray(image)
+    if zoom is not None and zoom > 0.0:
+        width, height = image.size
+        image = image.resize(
+            (round(width * zoom), round(height * zoom)), resample=Image.Resampling.BOX
+        )
+    image.save(save_path)
+def separate_image_channels(images, input_channels):
+    if len(images) != sum(input_channels):
+        raise ValueError(
+            f"Incompatible number of channels: {len(images):d} vs {sum(input_channels):d}"
+        )
+    image_list = []
+    curr_channel = 0
+    for num_channels in input_channels:
+        image_list.append(images[curr_channel : curr_channel + num_channels])
+        curr_channel += num_channels
+    return image_list
+def visualize_gaussians(
+    filepath, xy, scale, rot, feat, img_h, img_w, input_channels, alpha=0.8, gamma=None
+):
+    """
+    Visualize Gaussians as colored elliptical disks.
+    """
+    if feat.shape[1] != sum(input_channels):
+        raise ValueError(
+            f"Incompatible number of channels: {feat.shape[1]:d} vs {sum(input_channels):d}"
+        )
+    xy = xy.detach().cpu().clone().numpy()
+    y, x = xy[:, 1] * img_h, xy[:, 0] * img_w
+    scale = GAUSSIAN_ZOOM * scale.detach().cpu().clone().numpy()
+    rot = rot.detach().cpu().clone().numpy()
+    if gamma is not None:
+        feat = torch.pow(feat, 1.0 / gamma)
+    feat = np.clip(feat.detach().cpu().clone().numpy(), 0.0, 1.0)
+    curr_channel = 0
+    for image_id, num_channels in enumerate(input_channels, 1):
+        curr_feat = feat[:, curr_channel : curr_channel + num_channels]
+        fig = plt.figure()
+        fig.set_dpi(PLOT_DPI)
+        fig.set_size_inches(w=img_w / PLOT_DPI, h=img_h / PLOT_DPI, forward=False)
+        ax = plt.gca()
+        for gid in range(len(xy)):
+            ellipse = Ellipse(
+                xy=(x[gid], y[gid]),
+                width=scale[gid, 0],
+                height=scale[gid, 1],
+                angle=rot[gid, 0] * 180 / np.pi,
+                alpha=alpha,
+                ec=None,
+                fc=curr_feat[gid],
+                lw=None,
+            )
+            ax.add_patch(ellipse)
+        plt.xlim(0, img_w)
+        plt.ylim(img_h, 0)
+        plt.axis("off")
+        plt.tight_layout()
+        suffix = "" if len(input_channels) == 1 else f"_{image_id:d}"
+        plt.savefig(
+            f"{filepath}{suffix}.png", bbox_inches="tight", pad_inches=0, dpi=PLOT_DPI
+        )
+        plt.close()
+        curr_channel += num_channels
+def visualize_added_gaussians(
+    filepath,
+    images,
+    old_xy,
+    new_xy,
+    input_channels,
+    size=500,
+    every_n=5,
+    alpha=0.8,
+    gamma=None,
+):
+    """
+    Visualize the positions of added Gaussians during error-guided progressive optimization.
+    """
+    if len(images) != sum(input_channels):
+        raise ValueError(
+            f"Incompatible number of channels: {len(images):d} vs {sum(input_channels):d}"
+        )
+    image_height, image_width = images.shape[1:]
+    old_xy = old_xy.detach().cpu().clone().numpy()[::every_n]
+    new_xy = new_xy.detach().cpu().clone().numpy()[::every_n]
+    old_x, old_y = old_xy[:, 0] * image_width, old_xy[:, 1] * image_height
+    new_x, new_y = new_xy[:, 0] * image_width, new_xy[:, 1] * image_height
+    curr_channel = 0
+    for image_id, num_channels in enumerate(input_channels, 1):
+        image = images[curr_channel : curr_channel + num_channels]
+        image = to_output_format(image, gamma)
+        fig = plt.figure()
+        fig.set_dpi(PLOT_DPI)
+        fig.set_size_inches(
+            w=image_width / PLOT_DPI, h=image_height / PLOT_DPI, forward=False
+        )
+        plt.imshow(Image.fromarray(image), cmap="gray", vmin=0, vmax=255)
+        plt.scatter(old_x, old_y, s=size, c="#ef476f", marker="o", alpha=alpha)  # red
+        plt.scatter(new_x, new_y, s=size, c="#06d6a0", marker="o", alpha=alpha)  # green
+        plt.xlim(0, image_width)
+        plt.ylim(image_height, 0)
+        plt.axis("off")
+        plt.tight_layout()
+        suffix = "" if len(input_channels) == 1 else f"_{image_id:d}"
+        plt.savefig(
+            f"{filepath}{suffix}.png", bbox_inches="tight", pad_inches=0, dpi=PLOT_DPI
+        )
+        plt.close()
+        curr_channel += num_channels

utils/misc_utils.py ADDED Viewed

	@@ -0,0 +1,52 @@

+import os
+import random
+import shutil
+from argparse import ArgumentParser
+import numpy as np
+import torch
+import yaml
+def clean_dir(path):
+    if os.path.exists(path):
+        shutil.rmtree(path)
+def get_latest_ckpt_step(load_path):
+    saved_steps = [
+        int(os.path.splitext(path)[0].split("-")[-1])
+        for path in os.listdir(load_path)
+        if path.endswith(".pt")
+    ]
+    latest_step = -1 if len(saved_steps) == 0 else max(saved_steps)
+    return latest_step
+def set_random_seed(seed):
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+    torch.backends.cudnn.deterministic = True
+    torch.backends.cudnn.benchmark = False
+def load_cfg(cfg_path: str, parser: ArgumentParser) -> ArgumentParser:
+    with open(cfg_path, "r", encoding="utf-8") as file:
+        cfg: dict = yaml.safe_load(file)
+    for key, value in cfg.items():
+        if value is None:
+            raise ValueError("'None' is not a supported value in the config file")
+        if isinstance(value, bool):
+            parser.add_argument(f"--{key}", action="store_true", default=value)
+        else:
+            parser.add_argument(f"--{key}", type=type(value), default=value)
+    return parser
+def save_cfg(path: str, args, mode="w"):
+    with open(path, mode=mode, encoding="utf-8") as file:
+        print("#################### Training Config ####################", file=file)
+        yaml.dump(vars(args), file, default_flow_style=False)

utils/quantization_utils.py ADDED Viewed

	@@ -0,0 +1,17 @@

+import torch
+def ste_quantize(x: torch.Tensor, num_bits: int = 16) -> torch.Tensor:
+    """
+    Bit precision control of Gaussian parameters using a straight-through estimator.
+    Reference: https://arxiv.org/abs/1308.3432
+    """
+    qmin, qmax = 0, 2**num_bits - 1
+    min_val, max_val = x.min().item(), x.max().item()
+    scale = max((max_val - min_val) / (qmax - qmin), 1e-8)
+    # Quantize in forward pass (non-differentiable)
+    q_x = torch.round((x - min_val) / scale).clamp(qmin, qmax)
+    dq_x = q_x * scale + min_val
+    # Restore gradients in backward pass
+    dq_x = x + (dq_x - x).detach()
+    return dq_x

utils/saliency/decoder.py ADDED Viewed

	@@ -0,0 +1,62 @@

+import math
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+class Decoder(nn.Module):
+    def __init__(self, shape, num_img_feat, num_pla_feat):
+        super(Decoder, self).__init__()
+        self.shape = shape
+        self.img_model = self._make_layer(num_img_feat)
+        self.pla_model = self._make_layer(num_pla_feat)
+        self.combined = self._make_output(num_img_feat + num_pla_feat)
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.data.normal_(0, math.sqrt(2.0 / n))
+            elif isinstance(m, nn.BatchNorm2d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+    def _make_layer(self, num_feat):
+        ans = nn.ModuleList()
+        for _ in range(num_feat):
+            m = nn.Sequential(
+                nn.Conv2d(1, 1, 3, padding=1), nn.BatchNorm2d(1), nn.ReLU(inplace=True)
+            )
+            ans.append(m)
+        return ans
+    def _make_output(self, planes, readout=1):
+        return nn.Sequential(
+            nn.Conv2d(planes, readout, 3, stride=1, padding=1),
+            nn.BatchNorm2d(readout),
+            nn.Sigmoid(),
+        )
+    def forward(self, x):
+        img_feat, pla_feat = x
+        feat = []
+        for a, b in zip(img_feat, self.img_model):
+            f = F.interpolate(b(a), self.shape)
+            feat.append(f)
+        for a, b in zip(pla_feat, self.pla_model):
+            f = F.interpolate(b(a), self.shape)
+            feat.append(f)
+        feat = torch.cat(feat, dim=1)
+        feat = self.combined(feat)
+        return feat
+def build_decoder(model_path, *args):
+    decoder = Decoder(*args)
+    loaded = torch.load(model_path, weights_only=True)["state_dict"]
+    decoder.load_state_dict(loaded)
+    return decoder

utils/saliency/resnet.py ADDED Viewed

	@@ -0,0 +1,175 @@

+import math
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+def conv3x3(in_planes, out_planes, stride=1):
+    conv = nn.Conv2d(
+        in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False
+    )
+    return conv
+class BasicBlock(nn.Module):
+    expansion = 1
+    def __init__(self, inplanes, planes, stride=1, downsample=None):
+        super(BasicBlock, self).__init__()
+        self.conv1 = conv3x3(inplanes, planes, stride)
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.relu = nn.ReLU(inplace=True)
+        self.conv2 = conv3x3(planes, planes)
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.downsample = downsample
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        if self.downsample is not None:
+            residual = self.downsample(x)
+        out += residual
+        out = self.relu(out)
+        return out
+class Bottleneck(nn.Module):
+    expansion = 4
+    def __init__(self, inplanes, planes, stride=1, downsample=None):
+        super(Bottleneck, self).__init__()
+        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
+        self.bn1 = nn.BatchNorm2d(planes)
+        self.conv2 = nn.Conv2d(
+            planes, planes, kernel_size=3, stride=stride, padding=1, bias=False
+        )
+        self.bn2 = nn.BatchNorm2d(planes)
+        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
+        self.bn3 = nn.BatchNorm2d(planes * 4)
+        self.relu = nn.ReLU(inplace=True)
+        self.downsample = downsample
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.conv1(x)
+        out = self.bn1(out)
+        out = self.relu(out)
+        out = self.conv2(out)
+        out = self.bn2(out)
+        out = self.relu(out)
+        out = self.conv3(out)
+        out = self.bn3(out)
+        if self.downsample is not None:
+            residual = self.downsample(x)
+        out += residual
+        out = self.relu(out)
+        return out
+class ResNet(nn.Module):
+    def __init__(self, block, layers):
+        self.inplanes = 64
+        super(ResNet, self).__init__()
+        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
+        self.bn1 = nn.BatchNorm2d(64)
+        self.relu = nn.ReLU(inplace=True)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.layer1 = self._make_layer(block, 64, layers[0])
+        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
+        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
+        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
+        self.out_channels = 1
+        self.output0 = self._make_output(64, readout=self.out_channels)
+        self.output1 = self._make_output(256, readout=self.out_channels)
+        self.output2 = self._make_output(512, readout=self.out_channels)
+        self.output3 = self._make_output(1024, readout=self.out_channels)
+        self.output4 = self._make_output(2048, readout=self.out_channels)
+        self.combined = self._make_output(5, sigmoid=True)
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.data.normal_(0, math.sqrt(2.0 / n))
+            elif isinstance(m, nn.BatchNorm2d):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+    def _make_output(self, planes, readout=1, sigmoid=False):
+        layers = [
+            nn.Conv2d(planes, readout, kernel_size=3, padding=1),
+            nn.BatchNorm2d(readout),
+        ]
+        if sigmoid:
+            layers.append(nn.Sigmoid())
+        else:
+            layers.append(nn.ReLU(inplace=True))
+        return nn.Sequential(*layers)
+    def _make_layer(self, block, planes, blocks, stride=1):
+        downsample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(
+                    self.inplanes,
+                    planes * block.expansion,
+                    kernel_size=1,
+                    stride=stride,
+                    bias=False,
+                ),
+                nn.BatchNorm2d(planes * block.expansion),
+            )
+        layers = []
+        layers.append(block(self.inplanes, planes, stride, downsample))
+        self.inplanes = planes * block.expansion
+        for _ in range(1, blocks):
+            layers.append(block(self.inplanes, planes))
+        return nn.Sequential(*layers)
+    def forward(self, x, decode=False):
+        h, w = x.size(2), x.size(3)
+        x = self.conv1(x)
+        x = self.bn1(x)
+        out0 = self.relu(x)
+        x = self.maxpool(out0)
+        out1 = self.layer1(x)
+        out2 = self.layer2(out1)
+        out3 = self.layer3(out2)
+        out4 = self.layer4(out3)
+        out0 = self.output0(out0)
+        r, c = out0.size(2), out0.size(3)
+        out1 = self.output1(out1)
+        out2 = self.output2(out2)
+        out3 = self.output3(out3)
+        out4 = self.output4(out4)
+        if decode:
+            return [out0, out1, out2, out3, out4]
+        out1 = F.interpolate(out1, (r, c))
+        out2 = F.interpolate(out2, (r, c))
+        out3 = F.interpolate(out3, (r, c))
+        out4 = F.interpolate(out4, (r, c))
+        x = torch.cat([out0, out1, out2, out3, out4], dim=1)
+        x = self.combined(x)
+        x = F.interpolate(x, (h, w))
+        return x
+def resnet50(model_path, **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
+    if model_path is not None:
+        model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
+        model_state = model.state_dict()
+        loaded_model = torch.load(model_path, weights_only=True)
+        if "state_dict" in loaded_model:
+            loaded_model = loaded_model["state_dict"]
+        pretrained = {k[7:]: v for k, v in loaded_model.items() if k[7:] in model_state}
+        if len(pretrained) == 0:
+            pretrained = {k: v for k, v in loaded_model.items() if k in model_state}
+        model_state.update(pretrained)
+        model.load_state_dict(model_state)
+    return model

utils/saliency_utils.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import numpy as np
+import torch
+from skimage import filters
+from torchvision.transforms.functional import resize
+from utils.saliency import decoder, resnet
+def get_smap(image, path, filter_size=15):
+    """
+    Compute the saliency map of the target image using EMLNet.
+    Reference: https://arxiv.org/abs/1805.01047
+    Reference: https://github.com/SenJia/EML-NET-Saliency
+    """
+    if image.shape[0] != 3:
+        raise ValueError("Saliency prediction only supports RGB images")
+    sod_res = (480, 640)
+    imagenet_model = resnet.resnet50(f"{path}/emlnet/res_imagenet.pth").cuda().eval()
+    places_model = resnet.resnet50(f"{path}/emlnet/res_places.pth").cuda().eval()
+    decoder_model = (
+        decoder.build_decoder(f"{path}/emlnet/res_decoder.pth", sod_res, 5, 5)
+        .cuda()
+        .eval()
+    )
+    image_sod = resize(image, sod_res).unsqueeze(0)
+    with torch.no_grad():
+        imagenet_feat = imagenet_model(image_sod, decode=True)
+        places_feat = places_model(image_sod, decode=True)
+        smap = decoder_model([imagenet_feat, places_feat])
+    smap = resize(smap.squeeze(0).detach().cpu(), image.shape[1:]).squeeze(0)
+    def post_process(smap):
+        smap = filters.gaussian(smap, filter_size)
+        smap -= smap.min()
+        smap /= smap.max()
+        return smap
+    return post_process(smap.numpy()).astype(np.float32)

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff