UmiSonoda16
/

EquivariantModeling

class-conditional-image-generation

Model card Files Files and versions

UmiSonoda16 commited on Mar 25, 2025

Commit

f803535

·

verified ·

1 Parent(s): 1b241b1

Update README.md

Files changed (1) hide show

README.md +14 -2

README.md CHANGED Viewed

@@ -7,16 +7,28 @@ language:
 - en
 - zh
 ---
 <p align="center">
   <img src="visual.png" width="720">
 </p>
-This is an official model card of the paper ”An equivariant image modeling framework“.
 In this paper, we propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks in autoregressive image modeling by leveraging the translation invariance of natural visual signals. Our method introduces:
 * Column-wise tokenization which enhances translational symmetry along the horizontal axis.
 * Autoregressive generative models using windowed causal attention which enforces consistent contextual relationships across positions.
 Evaluated on class-conditioned ImageNet generation at 256×256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Moreover, our approach significantly improving zero-shot generalization and enabling ultra-long image synthesis.

 - en
 - zh
 ---
+[![arXiv](https://img.shields.io/badge/arXiv%20paper-2503.18948-b31b1b.svg)](https://arxiv.org/abs/2503.18948)&nbsp;
+This is an official model card of the paper [Equivariant Image Modeling](https://arxiv.org/abs/2503.18948).
 <p align="center">
   <img src="visual.png" width="720">
 </p>
 In this paper, we propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks in autoregressive image modeling by leveraging the translation invariance of natural visual signals. Our method introduces:
 * Column-wise tokenization which enhances translational symmetry along the horizontal axis.
 * Autoregressive generative models using windowed causal attention which enforces consistent contextual relationships across positions.
 Evaluated on class-conditioned ImageNet generation at 256×256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Moreover, our approach significantly improving zero-shot generalization and enabling ultra-long image synthesis.
+## Bibtex
+```bibtex
+@misc{dong2025equivariantimagemodeling,
+      title={Equivariant Image Modeling},
+      author={Ruixiao Dong and Mengde Xu and Zigang Geng and Li Li and Han Hu and Shuyang Gu},
+      year={2025},
+      eprint={2503.18948},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2503.18948},
+}
+```