|
|
--- |
|
|
title: Image GS |
|
|
emoji: 💻 |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: gradio |
|
|
sdk_version: 5.49.1 |
|
|
python_version: "3.10" |
|
|
app_file: gradio_app.py |
|
|
suggested_hardware: "cpu-basic" |
|
|
models: |
|
|
- blanchon/image-gs-models-utils |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<h1>Image-GS: Content-Adaptive Image Representation via 2D Gaussians</h1> |
|
|
|
|
|
[**Yunxiang Zhang**](https://yunxiangzhang.github.io/)<sup>1\*</sup>, |
|
|
[**Bingxuan Li**](https://bingxuan-li.github.io/)<sup>1\*</sup>, |
|
|
[**Alexandr Kuznetsov**](https://alexku.me/)<sup>3†</sup>, |
|
|
[**Akshay Jindal**](https://www.akshayjindal.com/)<sup>2</sup>, |
|
|
[**Stavros Diolatzis**](https://www.sdiolatz.info/)<sup>2</sup>, |
|
|
[**Kenneth Chen**](https://kenchen10.github.io/)<sup>1</sup>, |
|
|
[**Anton Sochenov**](https://www.intel.com/content/www/us/en/developer/articles/community/gpu-researchers-anton-sochenov.html)<sup>2</sup>, |
|
|
[**Anton Kaplanyan**](http://kaplanyan.com/)<sup>2</sup>, |
|
|
[**Qi Sun**](https://qisun.me/)<sup>1</sup> |
|
|
|
|
|
\* Equal contribution   † Work done while at Intel |
|
|
|
|
|
<sup>1</sup> |
|
|
<a href="https://www.immersivecomputinglab.org/research/"><img width="30%" src="assets/images/NYU-logo.png" style="vertical-align: top;" alt="NYU logo"></a> |
|
|
  |
|
|
<sup>2</sup> |
|
|
<a href="https://www.intel.com/content/www/us/en/developer/topic-technology/graphics-research/overview.html"><img width="22%" src="assets/images/Intel-logo.png" style="vertical-align: top;" alt="Intel logo"></a> |
|
|
  |
|
|
<sup>3</sup> |
|
|
<a href="https://www.amd.com/en.html"><img width="33%" src="assets/images/AMD-logo.png" style="vertical-align: top;" alt="AMD logo"></a> |
|
|
|
|
|
<a href="https://arxiv.org/abs/2407.01866"><img src="https://img.shields.io/badge/arXiv-2407.01866-red" alt="arXiv"></a> |
|
|
<a href="https://www.immersivecomputinglab.org/publication/image-gs-content-adaptive-image-representation-via-2d-gaussians/"><img src="https://img.shields.io/badge/project page-ImageGS-blue" alt="project page"></a> |
|
|
<a href="https://github.com/NYU-ICL/image-gs"><img src="https://visitor-badge.laobi.icu/badge?page_id=NYU-ICL.image-gs&left_color=green&right_color=red" alt="visitors"></a> |
|
|
|
|
|
</div> |
|
|
|
|
|
<div style="width: 90%; margin: 0 auto;"> |
|
|
Neural image representations have emerged as a promising approach for encoding and rendering visual data. Combined with learning-based workflows, they demonstrate impressive trade-offs between visual fidelity and memory footprint. Existing methods in this domain, however, often rely on fixed data structures that suboptimally allocate memory or compute-intensive implicit models, hindering their practicality for real-time graphics applications. |
|
|
|
|
|
Inspired by recent advancements in radiance field rendering, we introduce Image-GS, a content-adaptive image representation based on 2D Gaussians. Leveraging a custom differentiable renderer, Image-GS reconstructs images by adaptively allocating and progressively optimizing a group of anisotropic, colored 2D Gaussians. It achieves a favorable balance between visual fidelity and memory efficiency across a variety of stylized images frequently seen in graphics workflows, especially for those showing non-uniformly distributed features and in low-bitrate regimes. Moreover, it supports hardware-friendly rapid random access for real-time usage, requiring only 0.3K MACs to decode a pixel. Through error-guided progressive optimization, Image-GS naturally constructs a smooth level-of-detail hierarchy. We demonstrate its versatility with several applications, including texture compression, semantics-aware compression, and joint image compression and restoration. |
|
|
|
|
|
<img src="assets/images/teaser.jpg" width="100%" /> |
|
|
<sub> |
|
|
Figure 1: Image-GS reconstructs an image by adaptively allocating and progressively optimizing a set of colored 2D Gaussians. It achieves favorable rate-distortion trade-offs, hardware-friendly random access, and flexible quality control through a smooth level-of-detail stack. (a) visualizes the optimized spatial distribution of Gaussians (20% randomly sampled for clarity). (b) Image-GS’s explicit content-adaptive design effectively captures non-uniformly distributed image features and better preserves fine details under constrained memory budgets. In the inset error maps, brighter colors indicate larger errors. |
|
|
</sub> |
|
|
</div> |
|
|
|
|
|
## Setup |
|
|
|
|
|
1. Create a dedicated Python environment and install the dependencies |
|
|
```bash |
|
|
git clone https://github.com/NYU-ICL/image-gs.git |
|
|
cd image-gs |
|
|
conda env create -f environment.yml |
|
|
conda activate image-gs |
|
|
pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation |
|
|
cd gsplat |
|
|
pip install -e ".[dev]" |
|
|
cd .. |
|
|
``` |
|
|
2. Download the image and texture datasets from [OneDrive](https://1drv.ms/u/c/3a8968df8a027819/EeshjZJlMtdCmvvmESiN2pABM71EDaoLYmEwuOvecg0tAA?e=GybqBv) and organize the folder structure as follows |
|
|
``` |
|
|
image-gs |
|
|
└── media |
|
|
├── images |
|
|
└── textures |
|
|
``` |
|
|
3. (Optional) To run saliency-guided Gaussian position initialization, download the pre-trained [EML-Net](https://github.com/SenJia/EML-NET-Saliency) models ([res_imagenet.pth](https://drive.google.com/open?id=1-a494canr9qWKLdm-DUDMgbGwtlAJz71), [res_places.pth](https://drive.google.com/open?id=18nRz0JSRICLqnLQtAvq01azZAsH0SEzS), [res_decoder.pth](https://drive.google.com/open?id=1vwrkz3eX-AMtXQE08oivGMwS4lKB74sH)) and place them under the `models/emlnet/` folder |
|
|
``` |
|
|
image-gs |
|
|
└── models |
|
|
└── emlnet |
|
|
├── res_decoder.pth |
|
|
├── res_imagenet.pth |
|
|
└── res_places.pth |
|
|
``` |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
#### Image Compression |
|
|
|
|
|
- Optimize an Image-GS representation for an input image `anime-1_2k.png` using `10000` Gaussians with half-precision parameters |
|
|
|
|
|
```bash |
|
|
python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize |
|
|
``` |
|
|
|
|
|
- Render the corresponding optimized Image-GS representation at a new resolution with height `4000` (aspect ratio is maintained) |
|
|
|
|
|
```bash |
|
|
python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --eval --render_height=4000 |
|
|
``` |
|
|
|
|
|
#### Texture Stack Compression |
|
|
|
|
|
- Optimize an Image-GS representation for an input texture stack `alarm-clock_2k` using `30000` Gaussians with half-precision parameters |
|
|
|
|
|
```bash |
|
|
python main.py --input_path="textures/alarm-clock_2k" --exp_name="test/alarm-clock_2k" --num_gaussians=30000 --quantize |
|
|
``` |
|
|
|
|
|
- Render the corresponding optimized Image-GS representation at a new resolution with height `3000` (aspect ratio is maintained) |
|
|
|
|
|
```bash |
|
|
python main.py --input_path="textures/alarm-clock_2k" --exp_name="test/alarm-clock_2k" --num_gaussians=30000 --quantize --eval --render_height=3000 |
|
|
``` |
|
|
|
|
|
#### Control bit precision of Gaussian parameters |
|
|
|
|
|
- Optimize an Image-GS representation for an input image `anime-1_2k.png` using `10000` Gaussians with 12-bit-precision parameters |
|
|
|
|
|
```bash |
|
|
python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --pos_bits=12 --scale_bits 12 --rot_bits 12 --feat_bits 12 |
|
|
``` |
|
|
|
|
|
#### Switch to saliency-guided Gaussian position initialization |
|
|
|
|
|
- Optimize an Image-GS representation for an input image `anime-1_2k.png` using `10000` Gaussians with half-precision parameters and saliency-guided initialization |
|
|
|
|
|
```bash |
|
|
python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --init_mode="saliency" |
|
|
``` |
|
|
|
|
|
## Gradio Web Interface |
|
|
|
|
|
We provide a user-friendly web interface built with Gradio for easy experimentation and training visualization. |
|
|
|
|
|
### Setup for Web Interface |
|
|
|
|
|
1. Install Gradio (in addition to the main dependencies): |
|
|
|
|
|
```bash |
|
|
pip install gradio>=5.0.0 |
|
|
``` |
|
|
|
|
|
2. Launch the web interface: |
|
|
|
|
|
```bash |
|
|
python gradio_app.py |
|
|
``` |
|
|
|
|
|
3. Open your browser and navigate to `http://localhost:7860` |
|
|
|
|
|
### Features |
|
|
|
|
|
The Gradio interface provides: |
|
|
|
|
|
- **Interactive Parameter Configuration**: Adjust all training parameters through an intuitive UI |
|
|
- **Image Upload**: Drag and drop any image to train on |
|
|
- **Real-time Training Progress**: Stream training logs and intermediate results |
|
|
- **Live Visualization**: Watch Gaussian placement and rendering progress during training |
|
|
- **Result Gallery**: View final renders, gradient maps, and saliency maps |
|
|
- **Easy Experimentation**: No need to remember command-line arguments |
|
|
|
|
|
### Interface Sections |
|
|
|
|
|
1. **Configuration Panel**: |
|
|
|
|
|
- Basic parameters (number of Gaussians, training steps) |
|
|
- Quantization settings for memory efficiency |
|
|
- Initialization modes (gradient, saliency, random) |
|
|
- Advanced optimization parameters (learning rates, loss weights) |
|
|
|
|
|
2. **Training Progress**: |
|
|
|
|
|
- Real-time streaming logs |
|
|
- Current render and Gaussian visualization updates |
|
|
- Training status and control buttons |
|
|
|
|
|
3. **Results Display**: |
|
|
- Final optimized image |
|
|
- Gradient and saliency maps used for initialization |
|
|
- Download capabilities for all results |
|
|
|
|
|
### Usage Tips |
|
|
|
|
|
- Start with default parameters for your first run |
|
|
- Use **saliency initialization** for better results on complex images |
|
|
- Enable **Gaussian visualization** to see how the representation evolves |
|
|
- Adjust **save image steps** to control visualization frequency (lower = more updates, but slower) |
|
|
- For quick tests, reduce **max steps** to 500-1000 |
|
|
|
|
|
### Command Line Arguments |
|
|
|
|
|
Please refer to `cfgs/default.yaml` for the full list of arguments and their default values. |
|
|
|
|
|
**Post-optimization rendering** |
|
|
|
|
|
- `--eval` render the optimized Image-GS representation. |
|
|
- `--render_height` image height for rendering (aspect ratio is maintained). |
|
|
|
|
|
**Bit precision control**: 32 bits (float32) per dimension by default |
|
|
|
|
|
- `--quantize` enable bit precision control of Gaussian parameters. |
|
|
- `--pos_bits` bit precision of individual coordinate dimension. |
|
|
- `--scale_bits` bit precision of individual scale dimension. |
|
|
- `--rot_bits` bit precision of Gaussian orientation angle. |
|
|
- `--feat_bits` bit precision of individual feature dimension. |
|
|
|
|
|
**Logging** |
|
|
|
|
|
- `--exp_name` path to the logging directory. |
|
|
- `--vis_gaussians`: visualize Gaussians during optimization. |
|
|
- `--save_image_steps` frequency of rendering intermediate results during optimization. |
|
|
- `--save_ckpt_steps` frequency of checkpointing during optimization. |
|
|
|
|
|
**Input image** |
|
|
|
|
|
- `--input_path` path to an image file or a directory containing a texture stack. |
|
|
- `--downsample` load a downsampled version of the input image or texture stack as the optimization target to evaluate image upsampling performance. |
|
|
- `--downsample_ratio` downsampling ratio. |
|
|
- `--gamma` optimize in a gamma-corrected space, modify with caution. |
|
|
|
|
|
**Gaussian** |
|
|
|
|
|
- `--num_gaussians` number of Gaussians (for compression rate control). |
|
|
- `--init_scale` initial Gaussian scale in number of pixels. |
|
|
- `--disable_topk_norm` disable top-K normalization. |
|
|
- `--disable_inverse_scale` disable inverse Gaussian scale optimization. |
|
|
- `--init_mode` Gaussian position initialization mode, valid values include "gradient", "saliency", and "random". |
|
|
- `--init_random_ratio` ratio of Gaussians with randomly initialized position. |
|
|
|
|
|
**Optimization** |
|
|
|
|
|
- `--disable_tiles` disable tile-based rendering (warning: optimization and rendering without tiles will be way slower). |
|
|
- `--max_steps` maximum number of optimization steps. |
|
|
- `--pos_lr` Gaussian position learning rate. |
|
|
- `--scale_lr` Gaussian scale learning rate. |
|
|
- `--rot_lr` Gaussian orientation angle learning rate. |
|
|
- `--feat_lr` Gaussian feature learning rate. |
|
|
- `--disable_lr_schedule` disable learning rate decay and early stopping schedule. |
|
|
- `--disable_prog_optim` disable error-guided progressive optimization. |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
We would like to thank the [gsplat](https://github.com/nerfstudio-project/gsplat) team, and the authors of [3DGS](https://github.com/graphdeco-inria/gaussian-splatting), [fused-ssim](https://github.com/rahul-goel/fused-ssim), and [EML-Net](https://github.com/SenJia/EML-NET-Saliency) for their great work, based on which Image-GS was developed. |
|
|
|
|
|
## License |
|
|
|
|
|
This project is licensed under the terms of the MIT license. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find this project helpful to your research, please consider citing [BibTeX](assets/docs/image-gs.bib): |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{zhang2025image, |
|
|
title={Image-gs: Content-adaptive image representation via 2d gaussians}, |
|
|
author={Zhang, Yunxiang and Li, Bingxuan and Kuznetsov, Alexandr and Jindal, Akshay and Diolatzis, Stavros and Chen, Kenneth and Sochenov, Anton and Kaplanyan, Anton and Sun, Qi}, |
|
|
booktitle={Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers}, |
|
|
pages={1--11}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|