File size: 12,740 Bytes
d62394f 006a9e0 a0b1d08 ef87329 c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 d62394f c4db8c0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 |
---
title: Image GS
emoji: π»
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.49.1
python_version: "3.10"
app_file: gradio_app.py
suggested_hardware: "cpu-basic"
models:
- blanchon/image-gs-models-utils
pinned: false
---
<div align="center">
<h1>Image-GS: Content-Adaptive Image Representation via 2D Gaussians</h1>
[**Yunxiang Zhang**](https://yunxiangzhang.github.io/)<sup>1\*</sup>,
[**Bingxuan Li**](https://bingxuan-li.github.io/)<sup>1\*</sup>,
[**Alexandr Kuznetsov**](https://alexku.me/)<sup>3†</sup>,
[**Akshay Jindal**](https://www.akshayjindal.com/)<sup>2</sup>,
[**Stavros Diolatzis**](https://www.sdiolatz.info/)<sup>2</sup>,
[**Kenneth Chen**](https://kenchen10.github.io/)<sup>1</sup>,
[**Anton Sochenov**](https://www.intel.com/content/www/us/en/developer/articles/community/gpu-researchers-anton-sochenov.html)<sup>2</sup>,
[**Anton Kaplanyan**](http://kaplanyan.com/)<sup>2</sup>,
[**Qi Sun**](https://qisun.me/)<sup>1</sup>
\* Equal contribution   † Work done while at Intel
<sup>1</sup>
<a href="https://www.immersivecomputinglab.org/research/"><img width="30%" src="assets/images/NYU-logo.png" style="vertical-align: top;" alt="NYU logo"></a>
 
<sup>2</sup>
<a href="https://www.intel.com/content/www/us/en/developer/topic-technology/graphics-research/overview.html"><img width="22%" src="assets/images/Intel-logo.png" style="vertical-align: top;" alt="Intel logo"></a>
 
<sup>3</sup>
<a href="https://www.amd.com/en.html"><img width="33%" src="assets/images/AMD-logo.png" style="vertical-align: top;" alt="AMD logo"></a>
<a href="https://arxiv.org/abs/2407.01866"><img src="https://img.shields.io/badge/arXiv-2407.01866-red" alt="arXiv"></a>
<a href="https://www.immersivecomputinglab.org/publication/image-gs-content-adaptive-image-representation-via-2d-gaussians/"><img src="https://img.shields.io/badge/project page-ImageGS-blue" alt="project page"></a>
<a href="https://github.com/NYU-ICL/image-gs"><img src="https://visitor-badge.laobi.icu/badge?page_id=NYU-ICL.image-gs&left_color=green&right_color=red" alt="visitors"></a>
</div>
<div style="width: 90%; margin: 0 auto;">
Neural image representations have emerged as a promising approach for encoding and rendering visual data. Combined with learning-based workflows, they demonstrate impressive trade-offs between visual fidelity and memory footprint. Existing methods in this domain, however, often rely on fixed data structures that suboptimally allocate memory or compute-intensive implicit models, hindering their practicality for real-time graphics applications.
Inspired by recent advancements in radiance field rendering, we introduce Image-GS, a content-adaptive image representation based on 2D Gaussians. Leveraging a custom differentiable renderer, Image-GS reconstructs images by adaptively allocating and progressively optimizing a group of anisotropic, colored 2D Gaussians. It achieves a favorable balance between visual fidelity and memory efficiency across a variety of stylized images frequently seen in graphics workflows, especially for those showing non-uniformly distributed features and in low-bitrate regimes. Moreover, it supports hardware-friendly rapid random access for real-time usage, requiring only 0.3K MACs to decode a pixel. Through error-guided progressive optimization, Image-GS naturally constructs a smooth level-of-detail hierarchy. We demonstrate its versatility with several applications, including texture compression, semantics-aware compression, and joint image compression and restoration.
<img src="assets/images/teaser.jpg" width="100%" />
<sub>
Figure 1: Image-GS reconstructs an image by adaptively allocating and progressively optimizing a set of colored 2D Gaussians. It achieves favorable rate-distortion trade-offs, hardware-friendly random access, and flexible quality control through a smooth level-of-detail stack. (a) visualizes the optimized spatial distribution of Gaussians (20% randomly sampled for clarity). (b) Image-GSβs explicit content-adaptive design effectively captures non-uniformly distributed image features and better preserves fine details under constrained memory budgets. In the inset error maps, brighter colors indicate larger errors.
</sub>
</div>
## Setup
1. Create a dedicated Python environment and install the dependencies
```bash
git clone https://github.com/NYU-ICL/image-gs.git
cd image-gs
conda env create -f environment.yml
conda activate image-gs
pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation
cd gsplat
pip install -e ".[dev]"
cd ..
```
2. Download the image and texture datasets from [OneDrive](https://1drv.ms/u/c/3a8968df8a027819/EeshjZJlMtdCmvvmESiN2pABM71EDaoLYmEwuOvecg0tAA?e=GybqBv) and organize the folder structure as follows
```
image-gs
βββ media
βββ images
βββ textures
```
3. (Optional) To run saliency-guided Gaussian position initialization, download the pre-trained [EML-Net](https://github.com/SenJia/EML-NET-Saliency) models ([res_imagenet.pth](https://drive.google.com/open?id=1-a494canr9qWKLdm-DUDMgbGwtlAJz71), [res_places.pth](https://drive.google.com/open?id=18nRz0JSRICLqnLQtAvq01azZAsH0SEzS), [res_decoder.pth](https://drive.google.com/open?id=1vwrkz3eX-AMtXQE08oivGMwS4lKB74sH)) and place them under the `models/emlnet/` folder
```
image-gs
βββ models
βββ emlnet
βββ res_decoder.pth
βββ res_imagenet.pth
βββ res_places.pth
```
## Quick Start
#### Image Compression
- Optimize an Image-GS representation for an input image `anime-1_2k.png` using `10000` Gaussians with half-precision parameters
```bash
python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize
```
- Render the corresponding optimized Image-GS representation at a new resolution with height `4000` (aspect ratio is maintained)
```bash
python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --eval --render_height=4000
```
#### Texture Stack Compression
- Optimize an Image-GS representation for an input texture stack `alarm-clock_2k` using `30000` Gaussians with half-precision parameters
```bash
python main.py --input_path="textures/alarm-clock_2k" --exp_name="test/alarm-clock_2k" --num_gaussians=30000 --quantize
```
- Render the corresponding optimized Image-GS representation at a new resolution with height `3000` (aspect ratio is maintained)
```bash
python main.py --input_path="textures/alarm-clock_2k" --exp_name="test/alarm-clock_2k" --num_gaussians=30000 --quantize --eval --render_height=3000
```
#### Control bit precision of Gaussian parameters
- Optimize an Image-GS representation for an input image `anime-1_2k.png` using `10000` Gaussians with 12-bit-precision parameters
```bash
python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --pos_bits=12 --scale_bits 12 --rot_bits 12 --feat_bits 12
```
#### Switch to saliency-guided Gaussian position initialization
- Optimize an Image-GS representation for an input image `anime-1_2k.png` using `10000` Gaussians with half-precision parameters and saliency-guided initialization
```bash
python main.py --input_path="images/anime-1_2k.png" --exp_name="test/anime-1_2k" --num_gaussians=10000 --quantize --init_mode="saliency"
```
## Gradio Web Interface
We provide a user-friendly web interface built with Gradio for easy experimentation and training visualization.
### Setup for Web Interface
1. Install Gradio (in addition to the main dependencies):
```bash
pip install gradio>=5.0.0
```
2. Launch the web interface:
```bash
python gradio_app.py
```
3. Open your browser and navigate to `http://localhost:7860`
### Features
The Gradio interface provides:
- **Interactive Parameter Configuration**: Adjust all training parameters through an intuitive UI
- **Image Upload**: Drag and drop any image to train on
- **Real-time Training Progress**: Stream training logs and intermediate results
- **Live Visualization**: Watch Gaussian placement and rendering progress during training
- **Result Gallery**: View final renders, gradient maps, and saliency maps
- **Easy Experimentation**: No need to remember command-line arguments
### Interface Sections
1. **Configuration Panel**:
- Basic parameters (number of Gaussians, training steps)
- Quantization settings for memory efficiency
- Initialization modes (gradient, saliency, random)
- Advanced optimization parameters (learning rates, loss weights)
2. **Training Progress**:
- Real-time streaming logs
- Current render and Gaussian visualization updates
- Training status and control buttons
3. **Results Display**:
- Final optimized image
- Gradient and saliency maps used for initialization
- Download capabilities for all results
### Usage Tips
- Start with default parameters for your first run
- Use **saliency initialization** for better results on complex images
- Enable **Gaussian visualization** to see how the representation evolves
- Adjust **save image steps** to control visualization frequency (lower = more updates, but slower)
- For quick tests, reduce **max steps** to 500-1000
### Command Line Arguments
Please refer to `cfgs/default.yaml` for the full list of arguments and their default values.
**Post-optimization rendering**
- `--eval` render the optimized Image-GS representation.
- `--render_height` image height for rendering (aspect ratio is maintained).
**Bit precision control**: 32 bits (float32) per dimension by default
- `--quantize` enable bit precision control of Gaussian parameters.
- `--pos_bits` bit precision of individual coordinate dimension.
- `--scale_bits` bit precision of individual scale dimension.
- `--rot_bits` bit precision of Gaussian orientation angle.
- `--feat_bits` bit precision of individual feature dimension.
**Logging**
- `--exp_name` path to the logging directory.
- `--vis_gaussians`: visualize Gaussians during optimization.
- `--save_image_steps` frequency of rendering intermediate results during optimization.
- `--save_ckpt_steps` frequency of checkpointing during optimization.
**Input image**
- `--input_path` path to an image file or a directory containing a texture stack.
- `--downsample` load a downsampled version of the input image or texture stack as the optimization target to evaluate image upsampling performance.
- `--downsample_ratio` downsampling ratio.
- `--gamma` optimize in a gamma-corrected space, modify with caution.
**Gaussian**
- `--num_gaussians` number of Gaussians (for compression rate control).
- `--init_scale` initial Gaussian scale in number of pixels.
- `--disable_topk_norm` disable top-K normalization.
- `--disable_inverse_scale` disable inverse Gaussian scale optimization.
- `--init_mode` Gaussian position initialization mode, valid values include "gradient", "saliency", and "random".
- `--init_random_ratio` ratio of Gaussians with randomly initialized position.
**Optimization**
- `--disable_tiles` disable tile-based rendering (warning: optimization and rendering without tiles will be way slower).
- `--max_steps` maximum number of optimization steps.
- `--pos_lr` Gaussian position learning rate.
- `--scale_lr` Gaussian scale learning rate.
- `--rot_lr` Gaussian orientation angle learning rate.
- `--feat_lr` Gaussian feature learning rate.
- `--disable_lr_schedule` disable learning rate decay and early stopping schedule.
- `--disable_prog_optim` disable error-guided progressive optimization.
## Acknowledgements
We would like to thank the [gsplat](https://github.com/nerfstudio-project/gsplat) team, and the authors of [3DGS](https://github.com/graphdeco-inria/gaussian-splatting), [fused-ssim](https://github.com/rahul-goel/fused-ssim), and [EML-Net](https://github.com/SenJia/EML-NET-Saliency) for their great work, based on which Image-GS was developed.
## License
This project is licensed under the terms of the MIT license.
## Citation
If you find this project helpful to your research, please consider citing [BibTeX](assets/docs/image-gs.bib):
```bibtex
@inproceedings{zhang2025image,
title={Image-gs: Content-adaptive image representation via 2d gaussians},
author={Zhang, Yunxiang and Li, Bingxuan and Kuznetsov, Alexandr and Jindal, Akshay and Diolatzis, Stavros and Chen, Kenneth and Sochenov, Anton and Kaplanyan, Anton and Sun, Qi},
booktitle={Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
pages={1--11},
year={2025}
}
```
|