UmiSonoda16 commited on
Commit
f803535
·
verified ·
1 Parent(s): 1b241b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -2
README.md CHANGED
@@ -7,16 +7,28 @@ language:
7
  - en
8
  - zh
9
  ---
 
 
10
 
11
  <p align="center">
12
  <img src="visual.png" width="720">
13
  </p>
14
 
15
- This is an official model card of the paper ”An equivariant image modeling framework“.
16
-
17
  In this paper, we propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks in autoregressive image modeling by leveraging the translation invariance of natural visual signals. Our method introduces:
18
  * Column-wise tokenization which enhances translational symmetry along the horizontal axis.
19
  * Autoregressive generative models using windowed causal attention which enforces consistent contextual relationships across positions.
20
 
21
  Evaluated on class-conditioned ImageNet generation at 256×256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Moreover, our approach significantly improving zero-shot generalization and enabling ultra-long image synthesis.
22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - en
8
  - zh
9
  ---
10
+ [![arXiv](https://img.shields.io/badge/arXiv%20paper-2503.18948-b31b1b.svg)](https://arxiv.org/abs/2503.18948)&nbsp;
11
+ This is an official model card of the paper [Equivariant Image Modeling](https://arxiv.org/abs/2503.18948).
12
 
13
  <p align="center">
14
  <img src="visual.png" width="720">
15
  </p>
16
 
 
 
17
  In this paper, we propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks in autoregressive image modeling by leveraging the translation invariance of natural visual signals. Our method introduces:
18
  * Column-wise tokenization which enhances translational symmetry along the horizontal axis.
19
  * Autoregressive generative models using windowed causal attention which enforces consistent contextual relationships across positions.
20
 
21
  Evaluated on class-conditioned ImageNet generation at 256×256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Moreover, our approach significantly improving zero-shot generalization and enabling ultra-long image synthesis.
22
 
23
+ ## Bibtex
24
+ ```bibtex
25
+ @misc{dong2025equivariantimagemodeling,
26
+ title={Equivariant Image Modeling},
27
+ author={Ruixiao Dong and Mengde Xu and Zigang Geng and Li Li and Han Hu and Shuyang Gu},
28
+ year={2025},
29
+ eprint={2503.18948},
30
+ archivePrefix={arXiv},
31
+ primaryClass={cs.CV},
32
+ url={https://arxiv.org/abs/2503.18948},
33
+ }
34
+ ```