Update README.md
Browse files
README.md
CHANGED
|
@@ -7,16 +7,28 @@ language:
|
|
| 7 |
- en
|
| 8 |
- zh
|
| 9 |
---
|
|
|
|
|
|
|
| 10 |
|
| 11 |
<p align="center">
|
| 12 |
<img src="visual.png" width="720">
|
| 13 |
</p>
|
| 14 |
|
| 15 |
-
This is an official model card of the paper ”An equivariant image modeling framework“.
|
| 16 |
-
|
| 17 |
In this paper, we propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks in autoregressive image modeling by leveraging the translation invariance of natural visual signals. Our method introduces:
|
| 18 |
* Column-wise tokenization which enhances translational symmetry along the horizontal axis.
|
| 19 |
* Autoregressive generative models using windowed causal attention which enforces consistent contextual relationships across positions.
|
| 20 |
|
| 21 |
Evaluated on class-conditioned ImageNet generation at 256×256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Moreover, our approach significantly improving zero-shot generalization and enabling ultra-long image synthesis.
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
- en
|
| 8 |
- zh
|
| 9 |
---
|
| 10 |
+
[](https://arxiv.org/abs/2503.18948)
|
| 11 |
+
This is an official model card of the paper [Equivariant Image Modeling](https://arxiv.org/abs/2503.18948).
|
| 12 |
|
| 13 |
<p align="center">
|
| 14 |
<img src="visual.png" width="720">
|
| 15 |
</p>
|
| 16 |
|
|
|
|
|
|
|
| 17 |
In this paper, we propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks in autoregressive image modeling by leveraging the translation invariance of natural visual signals. Our method introduces:
|
| 18 |
* Column-wise tokenization which enhances translational symmetry along the horizontal axis.
|
| 19 |
* Autoregressive generative models using windowed causal attention which enforces consistent contextual relationships across positions.
|
| 20 |
|
| 21 |
Evaluated on class-conditioned ImageNet generation at 256×256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Moreover, our approach significantly improving zero-shot generalization and enabling ultra-long image synthesis.
|
| 22 |
|
| 23 |
+
## Bibtex
|
| 24 |
+
```bibtex
|
| 25 |
+
@misc{dong2025equivariantimagemodeling,
|
| 26 |
+
title={Equivariant Image Modeling},
|
| 27 |
+
author={Ruixiao Dong and Mengde Xu and Zigang Geng and Li Li and Han Hu and Shuyang Gu},
|
| 28 |
+
year={2025},
|
| 29 |
+
eprint={2503.18948},
|
| 30 |
+
archivePrefix={arXiv},
|
| 31 |
+
primaryClass={cs.CV},
|
| 32 |
+
url={https://arxiv.org/abs/2503.18948},
|
| 33 |
+
}
|
| 34 |
+
```
|