|
|
--- |
|
|
datasets: |
|
|
- ILSVRC/imagenet-1k |
|
|
- ljnlonoljpiljm/places365-256px |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
license: mit |
|
|
pipeline_tag: class-conditional-image-generation |
|
|
library_name: pytorch |
|
|
--- |
|
|
|
|
|
[](https://arxiv.org/abs/2503.18948) |
|
|
This is an official model card of the paper [Equivariant Image Modeling](https://arxiv.org/abs/2503.18948). |
|
|
|
|
|
<p align="center"> |
|
|
<img src="visual.png" width="720"> |
|
|
</p> |
|
|
|
|
|
In this paper, we propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks in autoregressive image modeling by leveraging the translation invariance of natural visual signals. Our method introduces: |
|
|
* Column-wise tokenization which enhances translational symmetry along the horizontal axis. |
|
|
* Autoregressive generative models using windowed causal attention which enforces consistent contextual relationships across positions. |
|
|
|
|
|
Evaluated on class-conditioned ImageNet generation at 256×256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Moreover, our approach significantly improving zero-shot generalization and enabling ultra-long image synthesis. |
|
|
|
|
|
## Bibtex |
|
|
```bibtex |
|
|
@misc{dong2025equivariantimagemodeling, |
|
|
title={Equivariant Image Modeling}, |
|
|
author={Ruixiao Dong and Mengde Xu and Zigang Geng and Li Li and Han Hu and Shuyang Gu}, |
|
|
year={2025}, |
|
|
eprint={2503.18948}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2503.18948}, |
|
|
} |
|
|
``` |