File size: 7,911 Bytes
3d1c0e1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
<p align="center">
<img src="assets/logo.png" width="400" style="border:none;box-shadow:none;border-radius:0;background:none;">
<p>
# Infinity**⭐️**: Unified **S**pace**T**ime **A**uto**R**egressive Modeling for Visual Generation
<div align="center">
[](http://opensource.bytedance.com/discord/invite)
[](https://arxiv.org/abs/2511.04675)
[](https://huggingface.co/FoundationVision/InfinityStar)
</div>
<p align="center" style="font-size: larger;">
<a href="http://arxiv.org/abs/2511.04675">Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation</a>
</p>
<!-- <p align="center">
<img src="assets/show_images.jpg" width=95%>
<p> -->
---
## 🔥 Updates!!
* Nov 7, 2025: 🔥 Paper, Training and Inference Codes && Checkpoints && Demo Website released!
* Sep 18, 2025: 🎉 InfinityStar is accepted as NeurIPS 2025 Oral.
## 🕹️ Try and Play with Infinity⭐️!
We provide a [demo website](http://opensource.bytedance.com/discord/invite) for you to play with InfinityStar and generate videos. Enjoy the fun of bitwise video autoregressive modeling!
## ✨ Overview
We introduce InfinityStar, a unified spacetime autoregressive framework for high-resolution image and dynamic video synthesis.
- 🧠 **Unified Spacetime Model**: A purely discrete, autoregressive approach that jointly captures spatial and temporal dependencies within a single, elegant architecture.
- 🎬 **Versatile Generation**: This unified design naturally supports a variety of generation tasks such as **text-to-image**, **text-to-video**, **image-to-video**, and **long interactive video synthesis** via straightforward temporal autoregression.
- 🏆 **Leading Performance & Speed**: Through extensive experiments, InfinityStar scores **83.74** on VBench, outperforming all autoregressive models by large margins, even surpassing diffusion competitors like HunyuanVideo, approximately **10x** faster than leading diffusion-based methods.
- 📖 **Pioneering High-Resolution Autoregressive Generation**: To our knowledge, InfinityStar is the first discrete autoregressive video generator capable of producing industrial-level 720p videos, setting a new standard for quality in its class.
### 🔥 Unified modeling for image, video generation and long interactive video synthesis 📈:
<div align="left">
<img src="assets/framework.png" alt="" style="width: 100%;" />
</div>
## 🎬 Video Demos
#### General Aesthetics
<div align="left">
<video src="https://github.com/user-attachments/assets/14e2b18b-9234-42ce-bdab-670faeef4b2a" width="100%" controls autoplay loop></video>
</div>
#### Anime & 3D Animation
<div align="left">
<video src="https://github.com/user-attachments/assets/478e9571-b550-4c23-a567-6fee9a0afb5b" width="100%" controls autoplay loop></video>
</div>
#### Motion
<div align="left">
<video src="https://github.com/user-attachments/assets/adab669b-d38f-4607-9a52-32d8d0bf0e53" width="100%" controls autoplay loop></video>
</div>
#### Extended Application: Long Interactive Videos
<div align="center">
<video src="https://github.com/user-attachments/assets/411666a6-563d-4551-a3f8-dc5de00436c1" width="100%" controls autoplay loop></video>
</div>
## Benchmark
### Achieve sota performance on image generation benchmark:
<div align="left">
<img src="assets/Infinitystar_image_gen_benchmark.png" alt="Image Generation Evaluation" style="width: 100%;" />
</div>
### Achieve sota performance on video generation benchmark:
<div align="left">
<img src="assets/Infinitystar_videogen_benchmark.png" alt="" style="width: 100%;" />
</div>
### Surpassing diffusion competitors like HunyuanVideo*:
<div align="left">
<img src="assets/Infinitystar_videogen_humaneval.png" alt="" style="width: 100%;" />
</div>
## Visualization
### Text to image examples
<div align="left">
<img src="assets/supp_show_images.png" alt="Text to Image Examples" style="width: 100%;" />
</div>
### Image to video examples
<div align="left">
<img src="assets/i2v_examples.png" alt="Image to Video Examples" style="width: 100%;" />
</div>
### Video extrapolation examples
<div align="left">
<img src="assets/v2v_examples.png" alt="Video Extrapolation Examples" style="width: 100%;" />
</div>
## 📑 Open-Source Plan
- [x] Training Code
- [x] Web Demo
- [x] InfinityStar Inference Code
- [x] InfinityStar Models Checkpoints
- [x] InfinityStar-Interact Inference Code
- [x] InfinityStar-Interact Checkpoints
## Installation
1. We use FlexAttention to speedup training, which requires `torch>=2.5.1`.
2. Install other pip packages via `pip3 install -r requirements.txt`.
## Training Scripts
We provide a comprehensive workflow for training and finetuning our model, covering data organization, feature extraction, and training scripts. For detailed instructions, please refer to `data/README.md`.
## Inference
* **720p Video Generation:**
Use `tools/infer_video_720p.py` to generate 5-second videos at 720p resolution. Due to the high computational cost of training, our released 720p model is trained for 5-second video generation. This script also supports image-to-video generation by specifying an image path.
```bash
python3 tools/infer_video_720p.py
```
* **480p Variable-Length Video Generation:**
We also provide an intermediate checkpoint for 480p resolution, capable of generating videos of 5 and 10 seconds. Since this model is not specifically optimized for Text-to-Video (T2V), we recommend using the experimental Image-to-Video (I2V) and Video-to-Video (V2V) modes for better results. To specify the video duration, you can edit the `generation_duration` variable in `tools/infer_video_480p.py` to either 5 or 10. This script also supports image-to-video and video continuation by providing a path to an image or a video.
```bash
python3 tools/infer_video_480p.py
```
* **480p Long Interactive Video Generation:**
Use `tools/infer_interact_480p.py` to generate a long interactive video in 480p. This script supports interactive video generation. You can provide a reference video and multiple prompts. The model will generate a video interactively with your assistance.
```bash
python3 tools/infer_interact_480p.py
```
## Citation
If our work assists your research, feel free to give us a star ⭐ or cite us using:
```
@Article{VAR,
title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction},
author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},
year={2024},
eprint={2404.02905},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
```
@misc{Infinity,
title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis},
author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu},
year={2024},
eprint={2412.04431},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.04431},
}
```
```
@misc{InfinityStar,
title={InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation},
author={Jinlai Liu and Jian Han and Bin Yan and Hui Wu and Fengda Zhu and Xing Wang and Yi Jiang and Bingyue Peng and Zehuan Yuan},
year={2025},
eprint={2511.04675},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.04675},
}
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|