Diffusers
Safetensors
happynear commited on
Commit
a2e3bb6
ยท
verified ยท
1 Parent(s): 22aef6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +117 -3
README.md CHANGED
@@ -1,3 +1,117 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Dreamer-MC: A Real-Time Autoregressive World Model for Infinite Video Generation
5
+
6
+ <div align="center">
7
+
8
+ <br>
9
+
10
+ [![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
11
+ [![Python](https://img.shields.io/badge/Python-3.10%2B-brightgreen)](https://www.python.org/)
12
+ [![PyTorch](https://img.shields.io/badge/PyTorch-2.8%2B-orange)](https://pytorch.org/)
13
+ [![Blog](https://img.shields.io/badge/Project-Blog-blue)](https://findlamp.github.io/dreamer-mc.github.io/)
14
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model%20Zoo-yellow)](https://huggingface.co/)
15
+
16
+ [**๐Ÿ“– Introduction**](#-introduction) | [**๐Ÿฐ Model Zoo**](#-model-zoo) | [**๐Ÿ› ๏ธ Installation**](#-installation) | [**๐Ÿ’ป Quick Start**](#-quick-start)
17
+
18
+ </div>
19
+
20
+ ---
21
+
22
+ ## ๐Ÿ“– Introduction
23
+
24
+ This repository contains the **Inference Code** for the Minecraft Autoregressive World Model.
25
+ This project serves as an **open-source reproduction of the [DreamerV4](https://danijar.com/project/dreamerv4/) architecture**, tailored specifically for high-fidelity simulation in the Minecraft environment. Our model utilizes a **MAE (Masked Autoencoder)** for efficient video compression and a **DiT (Diffusion Transformer)** architecture to autoregressively predict future game frames based on history and action inputs in the latent space.
26
+ This codebase is streamlined for **deployment and generation**, supporting long-context inference and real-time interaction.
27
+
28
+ ### Key Features
29
+ * **Inference Only**: Lightweight codebase focused on generation, stripped of complex training logic.
30
+ * **Long Context Support**: Capable of loading Long-Context models to recall events from 12 seconds prior.
31
+ * **Fast Inference Backend**: Built-in optimized inference pipeline designed for high-performance, real-time next-frame prediction.
32
+ * **Infinite Generation**: Supports infinite generation without image quality degradation during long-term rollouts.
33
+ * **Complex Interaction**: Supports a variety of interactions within the Minecraft world, such as eating food, collecting water, using weapons, etc.
34
+
35
+ ## ๐Ÿฐ Model Zoo
36
+
37
+ Please download the pre-trained weights and place them in the `checkpoints/` directory before running the code.
38
+
39
+ | Model Name | Params | VRAM Req | Description |
40
+ | :--- | :---: | :---: | :--- |
41
+ | **MAE-Tokenizer** | 430M | >2GB | Handles video encoding and decoding. |
42
+ | **Dynamic Model** | 1.7B | 9GB | Generates the next frame based on history and action. |
43
+
44
+ > ๐Ÿ”— **Download**: [HuggingFace Collection](https://huggingface.co/your-username/minecraft-world-model)
45
+
46
+ ## ๐Ÿ› ๏ธ Installation
47
+
48
+ We recommend using Python 3.10+ and CUDA 12.1+.
49
+
50
+ ```bash
51
+ # 1. Clone the repository
52
+ git clone https://github.com/IamCreateAI/Dreamerv4-MC.git
53
+ cd Dreamerv4-MC
54
+
55
+ # 2. Create a virtual environment
56
+ conda create -n dreamer python=3.12 -y
57
+ conda activate dreamer
58
+
59
+ # 3. Install PyTorch (Adjust index-url for your CUDA version)
60
+ pip install torch torchvision --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
61
+
62
+ # 4. Install dependencies
63
+ pip install -r requirements.txt
64
+ MAX_JOBS=4 pip install flash-attn --no-build-isolation
65
+ pip install -e .
66
+ ```
67
+ ## ๐Ÿ’ป Quick-Start
68
+ ```bash
69
+ python ui/inference_ui.py --dynamic_path=/path/to/dynamic_model \
70
+ --tokenizer_path=/path/to/tokenizer/ \
71
+ --record_video_output_path=output/
72
+ ```
73
+ ## ๐ŸŽฎ Controls
74
+
75
+ | Key | Action |
76
+ | :--- | :--- |
77
+ | **W / A / S / D** | Move |
78
+ | **Space** | Jump |
79
+ | **Left Click** | Attack / Destroy |
80
+ | **Right Click** | Place / Use Item |
81
+ | **E** | Open/Close Inventory (Simulation) |
82
+ | **1 - 9** | Select Hotbar Slot |
83
+ | **R** | start/stop record the video|
84
+ | **V** | refresh into new scene |
85
+ | **left Shift** |Sneak|
86
+ | **left ctrl** |Sprint|
87
+
88
+
89
+ ## ๐Ÿ“œ Citation
90
+ If you use this codebase in your research, please consider citing us as:
91
+ ```bash
92
+ @article{hafner2025dreamerv4,
93
+ title = {Dreamer-MC: A Real-Time Autoregressive World Model for Infinite Video Generation},
94
+ author = {Ming Gao, Yan Yan, ShengQu Xi, Yu Duan, ShengQian Li, Feng Wang},
95
+ year = {2026},
96
+ url = {https://findlamp.github.io/dreamer-mc.github.io/}
97
+ }
98
+ ```
99
+ as well as the original Dreamer 4 paper:
100
+ ```bash
101
+ @misc{Hafner2025TrainingAgents,
102
+ title={Training Agents Inside of Scalable World Models},
103
+ author={Danijar Hafner and Wilson Yan and Timothy Lillicrap},
104
+ year={2025},
105
+ eprint={2509.24527},
106
+ archivePrefix={arXiv},
107
+ primaryClass={cs.AI},
108
+ url={https://arxiv.org/abs/2509.24527},
109
+ }
110
+ ```
111
+
112
+
113
+ ## ๐Ÿ“š References
114
+ This project is built upon the following foundational works:
115
+ * **MaeTok**: [Masked Autoencoders Are Effective Tokenizers for Diffusion Models](https://arxiv.org/abs/2502.03444) (Chen et al., ICML 2025)
116
+ * **DreamerV4**: [Training Agents Inside of Scalable World Models](https://danijar.com/project/dreamerv4/) (Hafner et al., 2025)
117
+ * **CausVid**: [From Slow Bidirectional to Fast Autoregressive Video Diffusion Models](https://arxiv.org/abs/2412.07772) (Yin et al., CVPR 2025)