EquivariantModeling / README.md

UmiSonoda16

Update README.md

1d3806c verified 10 months ago

preview code

raw

history blame contribute delete

1.61 kB

metadata

datasets:
  - ILSVRC/imagenet-1k
  - ljnlonoljpiljm/places365-256px
language:
  - en
  - zh
license: mit
pipeline_tag: class-conditional-image-generation
library_name: pytorch

This is an official model card of the paper Equivariant Image Modeling.

In this paper, we propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks in autoregressive image modeling by leveraging the translation invariance of natural visual signals. Our method introduces:

Column-wise tokenization which enhances translational symmetry along the horizontal axis.
Autoregressive generative models using windowed causal attention which enforces consistent contextual relationships across positions.

Evaluated on class-conditioned ImageNet generation at 256×256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Moreover, our approach significantly improving zero-shot generalization and enabling ultra-long image synthesis.

Bibtex

@misc{dong2025equivariantimagemodeling,
      title={Equivariant Image Modeling}, 
      author={Ruixiao Dong and Mengde Xu and Zigang Geng and Li Li and Han Hu and Shuyang Gu},
      year={2025},
      eprint={2503.18948},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.18948}, 
}