stdstu123 commited on
Commit
b117c77
·
verified ·
1 Parent(s): 660fd0e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -14
README.md CHANGED
@@ -1,24 +1,62 @@
1
  ---
2
- library_name: <World Model>
3
- tags:
4
- - Text-to-Video
5
- - Image-to-Video
6
- - Diffusion Video Model
7
- - World Model
8
  license: apache-2.0
 
 
 
 
 
 
9
  ---
10
 
11
- # Yume: An Interactive World Generation Model
12
 
13
- This is a preview version of the Yume model, an interactive world generation model, presented in the paper
14
- [Yume: An Interactive World Generation Model](https://huggingface.co/papers/2507.17744) and [Yume-1.5: A Text-Controlled Interactive World Generation Model](
15
- https://arxiv.org/abs/2512.22096).
16
 
17
- Yume aims to create an interactive, realistic, and dynamic world from an input image, allowing exploration and control.
18
 
19
- Project Page: [https://stdstu12.github.io/YUME-Project/](https://stdstu12.github.io/YUME-Project/)\
20
- GitHub Repository: [https://github.com/stdstu12/YUME](https://github.com/stdstu12/YUME)
 
 
 
 
 
 
 
 
21
 
22
  ## Usage
23
 
24
- For detailed instructions and full inference scripts, please refer to the [GitHub repository](https://github.com/stdstu12/YUME).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+
3
+
4
+
5
+
6
+
7
+
8
  license: apache-2.0
9
+ pipeline_tag: image-to-video
10
+ tags:
11
+ - Text-to-Video
12
+ - Image-to-Video
13
+ - Diffusion Video Model
14
+ - World Model
15
  ---
16
 
17
+ # Yume-1.5: A Text-Controlled Interactive World Generation Model
18
 
19
+ Yume-1.5 is a framework designed to generate realistic, interactive, and continuous worlds from a single image or text prompt. It supports keyboard-based exploration of the generated environments through a framework that integrates context compression and real-time streaming acceleration.
 
 
20
 
 
21
 
22
+
23
+ - [**Paper (Yume-1.5)**](https://huggingface.co/papers/2512.22096)
24
+ - [**Paper (Yume-1.0)**](https://huggingface.co/papers/2507.17744)
25
+ - [**Project Page**](https://stdstu12.github.io/YUME-Project/)
26
+ - [**GitHub Repository**](https://github.com/stdstu12/YUME)
27
+
28
+ ## Features
29
+ - **Long-video generation**: Unified context compression with linear attention.
30
+ - **Real-time acceleration**: Powered by bidirectional attention distillation.
31
+ - **Text-controlled events**: Method for generating specific world events via text prompts.
32
 
33
  ## Usage
34
 
35
+ For detailed installation and setup instructions, please refer to the [GitHub repository](https://github.com/stdstu12/YUME).
36
+
37
+ ### Inference Example
38
+ To perform image-to-video generation using the provided scripts:
39
+
40
+ ```bash
41
+ # Generate videos from images in the specified directory
42
+ bash scripts/inference/sample_jpg.sh --jpg_dir="./jpg" --caption_path="./caption.txt"
43
+ ```
44
+
45
+ ## Citation
46
+
47
+ If you use Yume for your research, please cite the following:
48
+
49
+ ```bibtex
50
+ @article{mao2025yume,
51
+ title={Yume: An Interactive World Generation Model},
52
+ author={Mao, Xiaofeng and Lin, Shaoheng and Li, Zhen and Li, Chuanhao and Peng, Wenshuo and He, Tong and Pang, Jiangmiao and Chi, Mingmin and Qiao, Yu and Zhang, Kaipeng},
53
+ journal={arXiv preprint arXiv:2507.17744},
54
+ year={2025}
55
+ }
56
+ @article{mao2025yume,
57
+ title={Yume-1.5: A Text-Controlled Interactive World Generation Model},
58
+ author={Mao, Xiaofeng and Li, Zhen and Li, Chuanhao and Xu, Xiaojie and Ying, Kaining and He, Tong and Pang, Jiangmiao and Qiao, Yu and Zhang, Kaipeng},
59
+ journal={arXiv preprint arXiv:2512.22096},
60
+ year={2025}
61
+ }
62
+ ```