onpix commited on
Commit
441c700
·
verified ·
1 Parent(s): 3b4b838

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -252
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: other
 
 
3
  language:
4
  - en
5
  - zh
@@ -9,255 +11,3 @@ tags:
9
  - Text to 3D
10
  - Image to 3D
11
  ---
12
-
13
- <div align="center">
14
- <img src="https://github.com/Tencent-Hunyuan/HY-WorldPlay/blob/main/assets/teaser.png">
15
-
16
- <h1>🎮 HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency</h1>
17
-
18
-
19
-
20
-
21
- </div>
22
-
23
- <div align="center">
24
- <a href=https://3d.hunyuan.tencent.com/sceneTo3D?tab=worldplay target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
25
- <a href=https://huggingface.co/tencent/HunyuanWorld-1 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
26
- <a href=https://3d-models.hunyuan.tencent.com/world/ target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
27
- <a href=https://arxiv.org/abs/2507.21809 target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
28
- <a href=https://discord.gg/dNBrdrGGMa target="_blank"><img src= https://img.shields.io/badge/Discord-white.svg?logo=discord height=22px></a>
29
- <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Tencent%20HY-black.svg?logo=x height=22px></a>
30
- <a href="#community-resources" target="_blank"><img src=https://img.shields.io/badge/Community-lavender.svg?logo=homeassistantcommunitystore height=22px></a>
31
- </div>
32
-
33
- [//]: # ( <a href=# target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>)
34
-
35
- [//]: # ( <a href=# target="_blank"><img src= https://img.shields.io/badge/Colab-8f2628.svg?logo=googlecolab height=22px></a>)
36
-
37
- [//]: # ( <a href="#"><img alt="PyPI - Downloads" src="https://img.shields.io/pypi/v/mulankit?logo=pypi" height=22px></a>)
38
-
39
- <br>
40
-
41
- <p align="center">
42
- <i>"Hold Infinity in the Palm of Your Hand, and Eternity in an Hour"</i>
43
- </p>
44
-
45
- ## 🔥 News
46
- - December 15, 2025: 👋 We present the [technical report](https://arxiv.org/abs/2507.21809) and [research paper](https://arxiv.org/abs/2507.21809) of HY-World 1.5 (WorldPlay), please check out the details and spark some discussion!
47
- - December 15, 2025: 🤗 We release the first open-source, real-time interactive, and long-term geometric consistent world model, HY-World 1.5 (WorldPlay)!
48
-
49
- > Join our **[Wechat](#)** and **[Discord](https://discord.gg/dNBrdrGGMa)** group to discuss and find help from us.
50
-
51
-
52
- ## 📋 Table of Contents
53
- - [🔥 News](#-news)
54
- - [📋 Table of Contents](#-table-of-contents)
55
- - [📖 Introduction](#-introduction)
56
- - [✨ Highlights](#-highlights)
57
- - [📜 System Requirements](#-system-requirements)
58
- - [🛠️ Dependencies and Installation](#️-dependencies-and-installation)
59
- - [🎮 Quick Start](#-quick-start)
60
- - [🧱 Download Pretrained Models](#-download-pretrained-models)
61
- - [🔑 Inference](#-inference)
62
- - [📊 Evaluation](#-evaluation)
63
- - [🎬 More Examples](#-more-examples)
64
- - [📚 Citation](#-citation)
65
- - [🙏 Acknowledgements](#-acknowledgements)
66
-
67
- ## 📖 Introduction
68
- While **HunyuanWorld 1.0** is capable of generating immersive 3D worlds, it relies on a lengthy offline generation process and lacks real-time interaction. **HunyuanWorld 1.5** bridges this gap with {WorldPlay}, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. Our model draws power from four key designs. 1) We use a Dual Action Representation to enable robust action control in response to the user's keyboard and mouse inputs. 2) To enforce long-term consistency, our Reconstituted Context Memory dynamically rebuilds context from past frames and uses temporal reframing to keep geometrically important but long-past frames accessible, effectively alleviating memory attenuation. 3) We design WorldCompass, a novel Reinforcement Learning (RL) post-training framework designed to directly improve the action-following and visual quality of the long-horizon, autoregressive video model. 4) We also propose Context Forcing, a novel distillation method designed for memory-aware models. Aligning memory context between the teacher and student preserves the student's capacity to use long-range information, enabling real-time speeds while preventing error drift. Taken together, HY-World 1.5 generates long-horizon streaming video at 24 FPS with superior consistency, comparing favorably with existing techniques. Our model shows strong generalization across diverse scenes, supporting first-person and third-person perspectives in both real-world and stylized environments, enabling versatile applications such as 3D reconstruction, promptable events, and infinite world extension.
69
-
70
- <p align="center">
71
- <img src="https://github.com/Tencent-Hunyuan/HY-WorldPlay/blob/main/assets/teaser_2.png">
72
- </p>
73
-
74
- ## ✨ Highlights
75
-
76
- - **Systematic Overview**
77
-
78
- HY-World 1.5 has open-sourced a systematic and comprehensive training framework for real-time world models, covering the entire pipeline and all stages, including data, training, and inference deployment. The technical report discloses detailed training specifics for model pre-training, middle-training, reinforcement learning post-training, and memory-aware model distillation. In addition, the report introduces a series of engineering techniques aimed at reducing network transmission latency and model inference latency, thereby achieving a real-time streaming inference experience for users.
79
-
80
- <p align="center">
81
- <img src="https://github.com/Tencent-Hunyuan/HY-WorldPlay/blob/main/assets/overview.png">
82
- </p>
83
-
84
- - **Inference Pipeline**
85
-
86
- Given a single image or text prompt to describe a world, our model performs a next chunk (16 video frames) prediction task to generate future videos conditioned on action from users. For the generation of each chunk, we dynamically reconstitute context memory from past chunks to enforce long-term temporal and geometric consistency.
87
-
88
- <p align="center">
89
- <img src="https://github.com/Tencent-Hunyuan/HY-WorldPlay/blob/main/assets/pipeline.png">
90
- </p>
91
-
92
-
93
-
94
- ## 📜 System Requirements
95
-
96
- - **GPU**: NVIDIA GPU with CUDA support
97
- - **Minimum GPU Memory**: 14 GB (with model offloading enabled)
98
-
99
- > **Note:** The memory requirements above are measured with model offloading enabled. If your GPU has sufficient memory, you may disable offloading for improved inference speed.
100
-
101
-
102
- ## 🛠️ Dependencies and Installation
103
- ```bash
104
- conda create --name worldplay python=3.10 -y
105
- conda activate worldplay
106
- pip install -r requirements.txt
107
- ```
108
-
109
- - Flash Attention: Install Flash Attention for faster inference and reduced GPU memory consumption. Detailed installation instructions are available at [Flash Attention](https://github.com/Dao-AILab/flash-attention).
110
-
111
-
112
- ## 🎮 Quick Start
113
-
114
- We provide a demo for the HY-World 1.5 model for quick start.
115
-
116
- https://github.com/user-attachments/assets/63e5e5ec-34b2-4160-b7d2-4dd18cf25d71
117
-
118
-
119
- Try our **online demo** without installation: https://3d.hunyuan.tencent.com/sceneTo3D
120
-
121
- ## 🧱 Download Pretrained Models
122
- We provide the implementaion using the HunyuanVideo-1.5, which is one of most powerful open-source video diffusion models. The model checkpoints can be found in xxx.
123
-
124
- |ModelName| Download |
125
- |-|-------------------------------------------|
126
- HY-World1.5-Bidirectional-480P-I2V | |
127
- HY-World1.5-Autoregressive-480P-I2V | |
128
- HY-World1.5-Autoregressive-480P-I2V-distill | |
129
-
130
- ## 🔑 Inference
131
- We open source the inference code for both bidirectional and autoregressive diffusion models. For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls. The details can be found in [HunyuanVideo-1.5](https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5).
132
-
133
- We recommend using `generate_custom_trajectory.py` for generating customized camera trajectory.
134
-
135
- ```bash
136
- export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
137
- export T2V_REWRITE_MODEL_NAME="<your_model_name>"
138
- export I2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
139
- export I2V_REWRITE_MODEL_NAME="<your_model_name>"
140
-
141
- PROMPT='A paved pathway leads towards a stone arch bridge spanning a calm body of water. Lush green trees and foliage line the path and the far bank of the water. A traditional-style pavilion with a tiered, reddish-brown roof sits on the far shore. The water reflects the surrounding greenery and the sky. The scene is bathed in soft, natural light, creating a tranquil and serene atmosphere. The pathway is composed of large, rectangular stones, and the bridge is constructed of light gray stone. The overall composition emphasizes the peaceful and harmonious nature of the landscape.'
142
-
143
- IMAGE_PATH=./assets/img/test.png # Now we only provide the i2v model, so the path cannot be None
144
- SEED=1
145
- ASPECT_RATIO=16:9
146
- RESOLUTION=480p # Now we only provide the 480p model
147
- OUTPUT_PATH=./outputs/
148
- MODEL_PATH= # Path to pretrained hunyuanvideo-1.5 model
149
- AR_ACTION_MODEL_PATH= # Path to our HY-World 1.5 autoregressive checkpoints
150
- BI_ACTION_MODEL_PATH= # Path to our HY-World 1.5 bidirectional checkpoints
151
- AR_DISTILL_ACTION_MODEL_PATH= # Path to our HY-World 1.5 autoregressive distilled checkpoints
152
- POSE_JSON_PATH=./assets/pose/test_forward_32_latents.json # Path to the customized camera trajectory
153
- NUM_FRAMES=125
154
-
155
- # Configuration for faster inference
156
- # For AR inference, the maximum number recommended is 4. For bidirectional models, it can be set to 8.
157
- N_INFERENCE_GPU=4 # Parallel inference GPU count.
158
-
159
- # Configuration for better quality
160
- REWRITE=false # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
161
- ENABLE_SR=false # Enable super resolution. When the NUM_FRAMES == 121, you can set it to true
162
-
163
- # inference with bidirectional model
164
- torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
165
- --prompt "$PROMPT" \
166
- --image_path $IMAGE_PATH \
167
- --resolution $RESOLUTION \
168
- --aspect_ratio $ASPECT_RATIO \
169
- --video_length $NUM_FRAMES \
170
- --seed $SEED \
171
- --rewrite $REWRITE \
172
- --sr $ENABLE_SR --save_pre_sr_video \
173
- --pose_json_path $POSE_JSON_PATH \
174
- --output_path $OUTPUT_PATH \
175
- --model_path $MODEL_PATH \
176
- --action_ckpt $BI_ACTION_MODEL_PATH \
177
- --few_step false \
178
- --model_type 'bi'
179
-
180
- # inference with autoregressive model
181
- #torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
182
- # --prompt "$PROMPT" \
183
- # --image_path $IMAGE_PATH \
184
- # --resolution $RESOLUTION \
185
- # --aspect_ratio $ASPECT_RATIO \
186
- # --video_length $NUM_FRAMES \
187
- # --seed $SEED \
188
- # --rewrite $REWRITE \
189
- # --sr $ENABLE_SR --save_pre_sr_video \
190
- # --pose_json_path $POSE_JSON_PATH \
191
- # --output_path $OUTPUT_PATH \
192
- # --model_path $MODEL_PATH \
193
- # --action_ckpt $AR_ACTION_MODEL_PATH \
194
- # --few_step false \
195
- # --model_type 'ar'
196
-
197
- # inference with autoregressive distilled model
198
- #torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
199
- # --prompt "$PROMPT" \
200
- # --image_path $IMAGE_PATH \
201
- # --resolution $RESOLUTION \
202
- # --aspect_ratio $ASPECT_RATIO \
203
- # --video_length $NUM_FRAMES \
204
- # --seed $SEED \
205
- # --rewrite $REWRITE \
206
- # --sr $ENABLE_SR --save_pre_sr_video \
207
- # --pose_json_path $POSE_JSON_PATH \
208
- # --output_path $OUTPUT_PATH \
209
- # --model_path $MODEL_PATH \
210
- # --action_ckpt $AR_DISTILL_ACTION_MODEL_PATH \
211
- # --few_step true \
212
- # --num_inference_steps 4 \
213
- # --model_type 'ar'
214
- ```
215
-
216
-
217
- ## 📊 Evaluation
218
-
219
- HY-World 1.5 surpasses existing methods across various quantitative metrics, including reconstruction metrics for different video lengths and human evaluations.
220
-
221
- | Model | Real-time | | | Short-term | | | | | Long-term | | |
222
- |:---------------------------| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
223
- | | | **PSNR** ⬆ | **SSIM** ⬆ | **LPIPS** ⬇ | **$R_{dist}$** ⬇ | **$T_{dist}$** ⬇ | **PSNR** ⬆ | **SSIM** ⬆ | **LPIPS** ⬇ | **$R_{dist}$** ⬇ | **$T_{dist}$** ⬇ |
224
- | CameraCtrl | ❌ | 17.93 | 0.569 | 0.298 | 0.037 | 0.341 | 10.09 | 0.241 | 0.549 | 0.733 | 1.117 |
225
- | SEVA | ❌ | 19.84 | 0.598 | 0.313 | 0.047 | 0.223 | 10.51 | 0.301 | 0.517 | 0.721 | 1.893 |
226
- | ViewCrafter | ❌ | 19.91 | 0.617 | 0.327 | 0.029 | 0.543 | 9.32 | 0.271 | 0.661 | 1.573 | 3.051 |
227
- | Gen3C | ❌ | 21.68 | 0.635 | 0.278 | **0.024** | 0.477 | 15.37 | 0.431 | 0.483 | 0.357 | 0.979 |
228
- | VMem | ❌ | 19.97 | 0.587 | 0.316 | 0.048 | 0.219 | 12.77 | 0.335 | 0.542 | 0.748 | 1.547 |
229
- | Matrix-Game-2.0 | ✅ | 17.26 | 0.505 | 0.383 | 0.287 | 0.843 | 9.57 | 0.205 | 0.631 | 2.125 | 2.742 |
230
- | GameCraft | ❌ | 21.05 | 0.639 | 0.341 | 0.151 | 0.617 | 10.09 | 0.287 | 0.614 | 2.497 | 3.291 |
231
- | Ours (w/o Context Forcing) | ❌ | 21.27 | 0.669 | 0.261 | 0.033 | 0.157 | 16.27 | 0.425 | 0.495 | 0.611 | 0.991 |
232
- | **Ours (full)** | ✅ | **21.92** | **0.702** | **0.247** | 0.031 | **0.121** | **18.94** | **0.585** | **0.371** | **0.332** | **0.797** |
233
-
234
-
235
-
236
-
237
- <p align="center">
238
- <img src="https://github.com/Tencent-Hunyuan/HY-WorldPlay/blob/main/assets/human_eval.png">
239
- </p>
240
-
241
- ## 🎬 More Examples
242
-
243
- https://github.com/user-attachments/assets/6aac8ad7-3c64-4342-887f-53b7100452ed
244
-
245
- https://github.com/user-attachments/assets/531bf0ad-1fca-4d76-bb65-84701368926d
246
-
247
- https://github.com/user-attachments/assets/f165f409-5a74-4e19-a32c-fc98d92259e1
248
-
249
- ## 📚 Citation
250
-
251
- ```bibtex
252
- @article{hyworld2025,
253
- title={HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency},
254
- author={Team HunyuanWorld},
255
- journal={arXiv preprint},
256
- year={2025}
257
- }
258
- ```
259
-
260
-
261
- ## 🙏 Acknowledgements
262
- We would like to thank [HunyuanWorld](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0), [HunyuanWorld-Mirror
263
- ](https://github.com/Tencent-Hunyuan/HunyuanWorld-Mirror), [FlashWorld](https://github.com/imlixinyang/FlashWorld), [HunyuanVideo-1.5](https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5), and [FastVideo](https://github.com/hao-ai-lab/FastVideo) for their great work.
 
1
  ---
2
  license: other
3
+ license_name: tencent-hy-worldplay-community
4
+ license_link: https://github.com/Tencent-Hunyuan/HY-WorldPlay/blob/main/License.txt
5
  language:
6
  - en
7
  - zh
 
11
  - Text to 3D
12
  - Image to 3D
13
  ---