Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,117 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
# Dreamer-MC: A Real-Time Autoregressive World Model for Infinite Video Generation
|
| 5 |
+
|
| 6 |
+
<div align="center">
|
| 7 |
+
|
| 8 |
+
<br>
|
| 9 |
+
|
| 10 |
+
[](LICENSE)
|
| 11 |
+
[](https://www.python.org/)
|
| 12 |
+
[](https://pytorch.org/)
|
| 13 |
+
[](https://findlamp.github.io/dreamer-mc.github.io/)
|
| 14 |
+
[](https://huggingface.co/)
|
| 15 |
+
|
| 16 |
+
[**๐ Introduction**](#-introduction) | [**๐ฐ Model Zoo**](#-model-zoo) | [**๐ ๏ธ Installation**](#-installation) | [**๐ป Quick Start**](#-quick-start)
|
| 17 |
+
|
| 18 |
+
</div>
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## ๐ Introduction
|
| 23 |
+
|
| 24 |
+
This repository contains the **Inference Code** for the Minecraft Autoregressive World Model.
|
| 25 |
+
This project serves as an **open-source reproduction of the [DreamerV4](https://danijar.com/project/dreamerv4/) architecture**, tailored specifically for high-fidelity simulation in the Minecraft environment. Our model utilizes a **MAE (Masked Autoencoder)** for efficient video compression and a **DiT (Diffusion Transformer)** architecture to autoregressively predict future game frames based on history and action inputs in the latent space.
|
| 26 |
+
This codebase is streamlined for **deployment and generation**, supporting long-context inference and real-time interaction.
|
| 27 |
+
|
| 28 |
+
### Key Features
|
| 29 |
+
* **Inference Only**: Lightweight codebase focused on generation, stripped of complex training logic.
|
| 30 |
+
* **Long Context Support**: Capable of loading Long-Context models to recall events from 12 seconds prior.
|
| 31 |
+
* **Fast Inference Backend**: Built-in optimized inference pipeline designed for high-performance, real-time next-frame prediction.
|
| 32 |
+
* **Infinite Generation**: Supports infinite generation without image quality degradation during long-term rollouts.
|
| 33 |
+
* **Complex Interaction**: Supports a variety of interactions within the Minecraft world, such as eating food, collecting water, using weapons, etc.
|
| 34 |
+
|
| 35 |
+
## ๐ฐ Model Zoo
|
| 36 |
+
|
| 37 |
+
Please download the pre-trained weights and place them in the `checkpoints/` directory before running the code.
|
| 38 |
+
|
| 39 |
+
| Model Name | Params | VRAM Req | Description |
|
| 40 |
+
| :--- | :---: | :---: | :--- |
|
| 41 |
+
| **MAE-Tokenizer** | 430M | >2GB | Handles video encoding and decoding. |
|
| 42 |
+
| **Dynamic Model** | 1.7B | 9GB | Generates the next frame based on history and action. |
|
| 43 |
+
|
| 44 |
+
> ๐ **Download**: [HuggingFace Collection](https://huggingface.co/your-username/minecraft-world-model)
|
| 45 |
+
|
| 46 |
+
## ๐ ๏ธ Installation
|
| 47 |
+
|
| 48 |
+
We recommend using Python 3.10+ and CUDA 12.1+.
|
| 49 |
+
|
| 50 |
+
```bash
|
| 51 |
+
# 1. Clone the repository
|
| 52 |
+
git clone https://github.com/IamCreateAI/Dreamerv4-MC.git
|
| 53 |
+
cd Dreamerv4-MC
|
| 54 |
+
|
| 55 |
+
# 2. Create a virtual environment
|
| 56 |
+
conda create -n dreamer python=3.12 -y
|
| 57 |
+
conda activate dreamer
|
| 58 |
+
|
| 59 |
+
# 3. Install PyTorch (Adjust index-url for your CUDA version)
|
| 60 |
+
pip install torch torchvision --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
|
| 61 |
+
|
| 62 |
+
# 4. Install dependencies
|
| 63 |
+
pip install -r requirements.txt
|
| 64 |
+
MAX_JOBS=4 pip install flash-attn --no-build-isolation
|
| 65 |
+
pip install -e .
|
| 66 |
+
```
|
| 67 |
+
## ๐ป Quick-Start
|
| 68 |
+
```bash
|
| 69 |
+
python ui/inference_ui.py --dynamic_path=/path/to/dynamic_model \
|
| 70 |
+
--tokenizer_path=/path/to/tokenizer/ \
|
| 71 |
+
--record_video_output_path=output/
|
| 72 |
+
```
|
| 73 |
+
## ๐ฎ Controls
|
| 74 |
+
|
| 75 |
+
| Key | Action |
|
| 76 |
+
| :--- | :--- |
|
| 77 |
+
| **W / A / S / D** | Move |
|
| 78 |
+
| **Space** | Jump |
|
| 79 |
+
| **Left Click** | Attack / Destroy |
|
| 80 |
+
| **Right Click** | Place / Use Item |
|
| 81 |
+
| **E** | Open/Close Inventory (Simulation) |
|
| 82 |
+
| **1 - 9** | Select Hotbar Slot |
|
| 83 |
+
| **R** | start/stop record the video|
|
| 84 |
+
| **V** | refresh into new scene |
|
| 85 |
+
| **left Shift** |Sneak|
|
| 86 |
+
| **left ctrl** |Sprint|
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
## ๐ Citation
|
| 90 |
+
If you use this codebase in your research, please consider citing us as:
|
| 91 |
+
```bash
|
| 92 |
+
@article{hafner2025dreamerv4,
|
| 93 |
+
title = {Dreamer-MC: A Real-Time Autoregressive World Model for Infinite Video Generation},
|
| 94 |
+
author = {Ming Gao, Yan Yan, ShengQu Xi, Yu Duan, ShengQian Li, Feng Wang},
|
| 95 |
+
year = {2026},
|
| 96 |
+
url = {https://findlamp.github.io/dreamer-mc.github.io/}
|
| 97 |
+
}
|
| 98 |
+
```
|
| 99 |
+
as well as the original Dreamer 4 paper:
|
| 100 |
+
```bash
|
| 101 |
+
@misc{Hafner2025TrainingAgents,
|
| 102 |
+
title={Training Agents Inside of Scalable World Models},
|
| 103 |
+
author={Danijar Hafner and Wilson Yan and Timothy Lillicrap},
|
| 104 |
+
year={2025},
|
| 105 |
+
eprint={2509.24527},
|
| 106 |
+
archivePrefix={arXiv},
|
| 107 |
+
primaryClass={cs.AI},
|
| 108 |
+
url={https://arxiv.org/abs/2509.24527},
|
| 109 |
+
}
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
## ๐ References
|
| 114 |
+
This project is built upon the following foundational works:
|
| 115 |
+
* **MaeTok**: [Masked Autoencoders Are Effective Tokenizers for Diffusion Models](https://arxiv.org/abs/2502.03444) (Chen et al., ICML 2025)
|
| 116 |
+
* **DreamerV4**: [Training Agents Inside of Scalable World Models](https://danijar.com/project/dreamerv4/) (Hafner et al., 2025)
|
| 117 |
+
* **CausVid**: [From Slow Bidirectional to Fast Autoregressive Video Diffusion Models](https://arxiv.org/abs/2412.07772) (Yin et al., CVPR 2025)
|