BestWishYsh commited on
Commit
8807f5a
·
verified ·
1 Parent(s): 4aa4ae2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +327 -3
README.md CHANGED
@@ -1,3 +1,327 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Wan-AI/Wan2.1-T2V-14B-Diffusers
7
+ pipeline_tag: text-to-video
8
+ base_model_relation: finetune
9
+ ---
10
+
11
+ <div align=center>
12
+ <img src="https://github.com/PKU-YuanGroup/Helios-Page/blob/main/figures/logo_white.png?raw=true" width="300px">
13
+ </div>
14
+
15
+ <h2 align="center">Helios: Real Real-Time Long Video Generation Model</h2>
16
+
17
+ <h5 align="center">⭐ 14B Real-Time Long Video Generation Model can be Cheaper, Faster but Keep Stronger than 1.3B ones ⭐</h5>
18
+
19
+ <h5 align="center">
20
+
21
+ <!-- [![arXiv](https://img.shields.io/badge/arXiv-2501.xxxxx-b31b1b.svg?logo=arxiv)](https://arxiv.org/abs/) -->
22
+ [![arXiv](https://img.shields.io/badge/Technical--Report-2501.xxxxx-b31b1b.svg?logo=arxiv)](https://github.com/PKU-YuanGroup/Helios-Page/blob/main/helios_technical_report.pdf)
23
+ [![Project Page](https://img.shields.io/badge/Project-Website-2ea44f)](https://pku-yuangroup.github.io/Helios-Page)
24
+ [![HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-blue)](https://huggingface.co/collections/BestWishYsh/helios)
25
+ [![ModelScope](https://img.shields.io/badge/🤖-ModelScope-purple)](https://modelscope.cn/collections/BestWishYSH/Helios)
26
+
27
+ [![Ascend](https://img.shields.io/badge/Inference-Ascend--NPU-red)](https://www.hiascend.com/)
28
+ [![Diffusers](https://img.shields.io/badge/Inference-Diffusers-blueviolet)](https://github.com/huggingface/diffusers)
29
+ [![vLLM-Omni](https://img.shields.io/badge/Backend-vLLM--Omni-orange)](https://github.com/vllm-project/vllm-omni)
30
+ [![SGLang Diffusion](https://img.shields.io/badge/Backend-SGLang--Diffusion-yellow)](https://github.com/sgl-project/sglang)
31
+
32
+
33
+
34
+
35
+ </h5>
36
+
37
+ <div align="center">
38
+ This repository is the official implementation of Helios, which is a breakthrough video generation model that achieves minute-scale, high-quality video synthesis at <strong>19.5 FPS on a single H100 GPU</strong> (about 10 FPS on a single Ascend NPU) —without relying on conventional long video anti-drifting strategies or standard video acceleration techniques.
39
+ </div>
40
+
41
+ <br>
42
+
43
+ ## ✨ Highlights
44
+
45
+
46
+ 1. **Without commonly used anti-drifting strategies** (e.g., self-forcing, error-banks, keyframe sampling, or inverted sampling), Helios generates minute-scale videos with high quality and strong coherence.
47
+
48
+ 2. **Without standard acceleration techniques** (e.g., KV-cache, causal masking, sparse/linear attention, TinyVAE, progressive noise schedules, hidden-state caching, or quantization), Helios achieves 19.5 FPS in end-to-end inference on a single H100 GPU.
49
+
50
+ 3. **We introduce optimizations that improve both training and inference throughput while reducing memory consumption,** enabling image-diffusion-scale batch sizes during training while fitting up to four 14B models within 80 GB of GPU memory.
51
+
52
+
53
+
54
+ ## 🎬 Video Demos
55
+
56
+ <!-- <div align="center">
57
+ <video src="https://github.com/PKU-YuanGroup/Helios-Page/blob/main/videos/helios_features.mp4?raw=true" width="70%" controls="controls" poster=""></video>
58
+ </div>
59
+
60
+ or you can click <a href="https://www.youtube.com/watch?v=vd_AgHtOUFQ">here</a> to get the video. Some best prompts are [here](./example/prompt.txt). -->
61
+
62
+ [![Demo Video of Helios](https://github.com/user-attachments/assets/1d10da4a-aba9-4ac1-ab02-cd0dfce8d35b)](https://www.youtube.com/watch?v=vd_AgHtOUFQ)
63
+ or you can click <a href="https://github.com/PKU-YuanGroup/Helios-Page/blob/main/videos/helios_features.mp4">here</a> to get the video. Some best prompts are [here](./example/prompt.txt).
64
+
65
+
66
+ ## 📣 Latest News!!
67
+
68
+ * ⏳⏳⏳ Release the [Technical Report](https://github.com/PKU-YuanGroup/Helios-Page/blob/main/helios_technical_report.pdf) on arXiv.
69
+ * `[2025.03.04]` 🚀 Day-0 support for [Ascend-NPU](https://www.hiascend.com),with sincere gratitude to the Ascend Team for their support.
70
+ * `[2025.03.04]` 🚀 Day-0 support for [Diffusers](https://github.com/huggingface/diffusers),with special thanks to the HuggingFace Team for their support.
71
+ * `[2025.03.04]` 🚀 Day-0 support for [vLLM-Omni](https://github.com/vllm-project/vllm-omni),with heartfelt gratitude to the vLLM Team for their support.
72
+ * `[2025.03.04]` 🚀 Day-0 support for [SGLang-Diffusion](https://github.com/sgl-project/sglang),with huge thanks to the SGLang Team for their support.
73
+ * `[2025.03.04]` 🔥 We've released the training/inference code and weights of **Helios-Base**, **Helios-Mid** and **Helios-Distilled**.
74
+
75
+
76
+
77
+ ## 🔥 Friendly Links
78
+
79
+ If your work has improved **Helios** and you would like more people to see it, please inform us.
80
+
81
+ * [Ascend-NPU](https://www.hiascend.com/): Developed by Huawei, this hardware is designed for efficient AI model training and inference, boosting performance in tasks like computer vision, natural language processing, and autonomous driving.
82
+ * [Diffusers](https://github.com/huggingface/diffusers): A popular library designed for working with diffusion models and other generative models in deep learning. It supports easy integration and manipulation of a wide range of generative models.
83
+ * [vLLM-Omni](https://github.com/vllm-project/vllm-omni): A fully disaggregated serving system for any-to-any models. vLLM-Omni breaks complex architectures into a stage-based graph, using a decoupled backend to maximize resource efficiency and throughput.
84
+ * [SGLang-Diffusion](https://github.com/sgl-project/sglang): An inference framework for accelerated image and video generation using diffusion models. It provides an end-to-end unified pipeline with optimized kernels and an efficient scheduler loop.
85
+
86
+
87
+
88
+ ### Model Download
89
+
90
+ | Models | Download Link | Supports | Notes |
91
+ |------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|---------------------------------------------------------------------------------------------|
92
+ | Helios-Base | 🤗 [Huggingface](https://huggingface.co/BestWishYsh/Helios-Base) 🤖 [ModelScope](https://modelscope.cn/datasets/BestWishYSH/Helios-Base) | T2V ✅ I2V ✅ V2V ✅ Interactive ✅ | Best Quality, with v-prediction, standard CFG and custom HeliosScheduler. |
93
+ | Helios-Mid | 🤗 [Huggingface](https://huggingface.co/BestWishYsh/Helios-Mid) 🤖 [ModelScope](https://modelscope.cn/datasets/BestWishYSH/Helios-Mid) | T2V ✅ I2V ✅ V2V ✅ Interactive ✅ | Intermediate Ckpt, with v-prediction, CFG-Zero* and custom HeliosScheduler. |
94
+ | Helios-Distilled | 🤗 [Huggingface](https://huggingface.co/BestWishYsh/Helios-Distilled) 🤖 [ModelScope](https://modelscope.cn/datasets/BestWishYSH/Helios-Distilled) | T2V ✅ I2V ✅ V2V ✅ Interactive ✅ | Best Efficiency, with x0-prediction and custom HeliosDMDScheduler. |
95
+ > 💡Note:
96
+ > * All three models share the same architecture, but Helios-Mid and Helios-Distilled use a more aggressive multi-scale sampling pipeline to achieve better efficiency.
97
+ > * Helios-Mid is an intermediate checkpoint generated in the process of distilling Helios-Base into Helios-Distilled, and may not meet expected quality.
98
+ > * For Image-to-Video or Video-to-Video, since training is based on Text-to-Video, these two functions may be slightly inferior to Text-to-Video. You may enable `is_skip_first_chunk` if you find the first few chunks are static.
99
+
100
+
101
+ Download models using huggingface-cli:
102
+ ``` sh
103
+ pip install "huggingface_hub[cli]"
104
+ huggingface-cli download BestWishYSH/Helios-Base --local-dir BestWishYSH/Helios-Base
105
+ huggingface-cli download BestWishYSH/Helios-Mid --local-dir BestWishYSH/Helios-Mid
106
+ huggingface-cli download BestWishYSH/Helios-Distilled --local-dir BestWishYSH/HeliosDistillede
107
+ ```
108
+
109
+ Download models using modelscope-cli:
110
+ ``` sh
111
+ pip install modelscope
112
+ modelscope download BestWishYSH/Helios-Base --local_dir BestWishYSH/Helios-Base
113
+ modelscope download BestWishYSH/Helios-Mid --local-dir BestWishYSH/Helios-Mid
114
+ modelscope download BestWishYSH/Helios-Distilled --local-dir BestWishYSH/HeliosDistillede
115
+ ```
116
+
117
+ ## 🚀 Inference
118
+
119
+
120
+ Helios uses an autoregressive approach that generates **33 frames per chunk**. For optimal performance, `num_frames` should be set to a multiple of `33`. If a non-multiple value is provided, it will be automatically rounded up to the nearest multiple of 33.
121
+
122
+ **Example frame counts for different video lengths:**
123
+
124
+ | num_frames | Adjusted Frames | 24 FPS | 16 FPS |
125
+ |------------|-----------------|--------|--------|
126
+ | 1449 | 1452 (33×44) | ~60s (1min) | ~90s (1min 30s) |
127
+ | 720 | 726 (33×22) | ~30s | ~45s |
128
+ | 240 | 264 (33×8) | ~11s | ~16s |
129
+ | 129 | 132 (33×4) | ~5.5s | ~8s |
130
+
131
+ ### Sanity Check
132
+
133
+ Before trying your own inputs, we highly recommend going through the sanity check to find out if any hardware or software went wrong.
134
+
135
+ | Task | **Helios-Base** | **Helios-Mid** | **Helios-Distilled** |
136
+ | ------- | -------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
137
+ | **T2V** | <video src="https://github.com/user-attachments/assets/14e10753-0366-4790-ad8f-7b66d821ed11" controls width="240"></video> | <video src="https://github.com/user-attachments/assets/c1778691-a80b-428c-8094-88bb1dd1d52b" controls width="240"></video> | <video src="https://github.com/user-attachments/assets/4ca28c79-9dfa-49de-9c3a-f4c7b6c766cd" controls width="240"></video> |
138
+ | **V2V** | <video src="https://github.com/user-attachments/assets/420cb572-85c2-42d8-98d7-37b0bc24c844" controls width="240"></video> | <video src="https://github.com/user-attachments/assets/7d703fa6-dc1a-4138-a897-e58cfd9236d6" controls width="240"></video> | <video src="https://github.com/user-attachments/assets/45329c55-1a25-459c-bbf0-4e584ec5b23d" controls width="240"></video> |
139
+
140
+
141
+ ### ✨ Diffusers Pipeline
142
+
143
+ Install diffusers from source:
144
+ ```bash
145
+ pip install git+https://github.com/huggingface/diffusers.git
146
+ ```
147
+
148
+ For example, let's take Helios-Distilled.
149
+
150
+ <details>
151
+ <summary>Click to expand the code</summary>
152
+
153
+ ```bash
154
+ import torch
155
+ from diffusers import ModularPipeline, ClassifierFreeGuidance
156
+ from diffusers.utils import export_to_video, load_image, load_video
157
+
158
+ mod_pipe = ModularPipeline.from_pretrained("BestWishYsh/Helios-Distilled")
159
+ mod_pipe.load_components(torch_dtype=torch.bfloat16)
160
+ mod_pipe.to("cuda")
161
+
162
+ # we need to upload guider to the model repo, so each checkpoint will be able to config their guidance differently
163
+ guider = ClassifierFreeGuidance(guidance_scale=1.0)
164
+ mod_pipe.update_components(guider=guider)
165
+
166
+ # --- T2V ---
167
+ print("=== T2V ===")
168
+ prompt = (
169
+ "A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. "
170
+ "The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving "
171
+ "fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and "
172
+ "sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef "
173
+ "itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures "
174
+ "the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. "
175
+ "A close-up shot with dynamic movement."
176
+ )
177
+
178
+ output = mod_pipe(
179
+ prompt=prompt,
180
+ height=384,
181
+ width=640,
182
+ num_frames=240,
183
+ pyramid_num_inference_steps_list=[2, 2, 2],
184
+ is_amplify_first_chunk=True,
185
+ generator=torch.Generator("cuda").manual_seed(42),
186
+ output="videos",
187
+ )
188
+
189
+ export_to_video(output[0], "helios_distilled_modular_t2v_output.mp4", fps=24)
190
+ print(f"T2V max memory: {torch.cuda.max_memory_allocated() / 1024**3:.3f} GB")
191
+ torch.cuda.empty_cache()
192
+ torch.cuda.reset_peak_memory_stats()
193
+
194
+ # --- I2V ---
195
+ print("=== I2V ===")
196
+ image = load_image(
197
+ "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/helios/wave.jpg"
198
+ )
199
+ i2v_prompt = (
200
+ "A towering emerald wave surges forward, its crest curling with raw power and energy. "
201
+ "Sunlight glints off the translucent water, illuminating the intricate textures and deep green hues within the wave's body."
202
+ )
203
+
204
+ output = mod_pipe(
205
+ prompt=i2v_prompt,
206
+ image=image,
207
+ height=384,
208
+ width=640,
209
+ num_frames=240,
210
+ pyramid_num_inference_steps_list=[2, 2, 2],
211
+ is_amplify_first_chunk=True,
212
+ generator=torch.Generator("cuda").manual_seed(42),
213
+ output="videos",
214
+ )
215
+
216
+ export_to_video(output[0], "helios_distilled_modular_i2v_output.mp4", fps=24)
217
+ print(f"I2V max memory: {torch.cuda.max_memory_allocated() / 1024**3:.3f} GB")
218
+ torch.cuda.empty_cache()
219
+ torch.cuda.reset_peak_memory_stats()
220
+
221
+ # --- V2V ---
222
+ print("=== V2V ===")
223
+ video = load_video(
224
+ "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/helios/car.mp4"
225
+ )
226
+ v2v_prompt = (
227
+ "A dynamic time-lapse video showing the rapidly moving scenery from the window of a speeding train. "
228
+ "The camera captures various elements such as lush green fields, towering trees, quaint countryside houses, "
229
+ "and distant mountain ranges passing by quickly."
230
+ )
231
+
232
+ output = mod_pipe(
233
+ prompt=v2v_prompt,
234
+ video=video,
235
+ height=384,
236
+ width=640,
237
+ num_frames=240,
238
+ pyramid_num_inference_steps_list=[2, 2, 2],
239
+ is_amplify_first_chunk=True,
240
+ generator=torch.Generator("cuda").manual_seed(42),
241
+ output="videos",
242
+ )
243
+
244
+ export_to_video(output[0], "helios_distilled_modular_v2v_output.mp4", fps=24)
245
+ print(f"V2V max memory: {torch.cuda.max_memory_allocated() / 1024**3:.3f} GB")
246
+ ```
247
+
248
+ </details>
249
+
250
+ ### ✨ vLLM-Omni Pipeline
251
+
252
+ Install vllm-omni from source:
253
+ ```bash
254
+ pip install git+https://github.com/vllm-project/vllm-omni.git
255
+ ```
256
+
257
+ For example, let's take Text-to-Video.
258
+
259
+ <details>
260
+ <summary>Click to expand the code</summary>
261
+
262
+ ```bash
263
+ cd vllm-omni
264
+
265
+ # Helios-Base
266
+ python3 examples/offline_inference/helios/end2end.py \
267
+ --sample-type t2v \
268
+ --model ./Helios-Base \
269
+ --prompt "A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement." \
270
+ --num-frames 600 \
271
+ --seed 42 \
272
+ --output helios_t2v_base.mp4
273
+
274
+ # Helios-Mid
275
+ python examples/offline_inference/helios/end2end.py \
276
+ --model ./Helios-Mid --sample-type t2v \
277
+ --prompt "A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement." \
278
+ --guidance-scale 5.0 --is-enable-stage2 \
279
+ --pyramid-num-inference-steps-list 20 20 20 \
280
+ --use-cfg-zero-star --use-zero-init --zero-steps 1 \
281
+ --output helios_t2v_mid.mp4
282
+
283
+ # Helios-Distilled
284
+ python examples/offline_inference/helios/end2end.py \
285
+ --model ./Helios-Distilled --sample-type t2v \
286
+ --prompt "A vibrant tropical fish swimming gracefully among colorful coral reefs in a clear, turquoise ocean. The fish has bright blue and yellow scales with a small, distinctive orange spot on its side, its fins moving fluidly. The coral reefs are alive with a variety of marine life, including small schools of colorful fish and sea turtles gliding by. The water is crystal clear, allowing for a view of the sandy ocean floor below. The reef itself is adorned with a mix of hard and soft corals in shades of red, orange, and green. The photo captures the fish from a slightly elevated angle, emphasizing its lively movements and the vivid colors of its surroundings. A close-up shot with dynamic movement." \
287
+ --num-frames 240 --guidance-scale 1.0 --is-enable-stage2 \
288
+ --pyramid-num-inference-steps-list 2 2 2 \
289
+ --is-amplify-first-chunk --output helios_t2v_distilled.mp4
290
+ ```
291
+ </details>
292
+
293
+ ### ✨ SGLang-Diffusion Pipeline
294
+
295
+ Install sglang-diffusion from source:
296
+ ```bash
297
+ pip install git+https://github.com/sgl-project/sglang.git
298
+ ```
299
+
300
+ For example, let's take Helios-Distilled.
301
+
302
+ <details>
303
+ <summary>Click to expand the code</summary>
304
+
305
+ ```bash
306
+ cd sglang
307
+ ```
308
+ </details>
309
+
310
+ ## 🙌 Description
311
+
312
+ - **Repository:** [Code](https://github.com/PKU-YuanGroup/Helios), [Page](https://pku-yuangroup.github.io/Helios-Page/)
313
+ - **Paper:** [https://huggingface.co/papers/2411.17440](https://github.com/PKU-YuanGroup/Helios-Page/blob/main/helios_technical_report.pdf)
314
+ - **Point of Contact:** [Shenghai Yuan](shyuan-cs@hotmail.com)
315
+
316
+ ## ✏️ Citation
317
+
318
+ If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝:
319
+
320
+ ```BibTeX
321
+ @article{helios,
322
+ title={Helios: Real-Time Long Video Generation without Anti-Drifting Strategies},
323
+ author={Yuan, Shenghai and Yin, Yuanyang and Li, Zongjian and Huang, Xinwei and Yang, Xiao and Yuan, Li},
324
+ journal={arXiv preprint arXiv:2603.xxxxx},
325
+ year={2026}
326
+ }
327
+ ```