File size: 5,014 Bytes
a2e3bb6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
license: apache-2.0
---
# Dreamer-MC: A Real-Time Autoregressive World Model for Infinite Video Generation
<div align="center">
<br>
[](LICENSE)
[](https://www.python.org/)
[](https://pytorch.org/)
[](https://findlamp.github.io/dreamer-mc.github.io/)
[](https://huggingface.co/)
[**๐ Introduction**](#-introduction) | [**๐ฐ Model Zoo**](#-model-zoo) | [**๐ ๏ธ Installation**](#-installation) | [**๐ป Quick Start**](#-quick-start)
</div>
---
## ๐ Introduction
This repository contains the **Inference Code** for the Minecraft Autoregressive World Model.
This project serves as an **open-source reproduction of the [DreamerV4](https://danijar.com/project/dreamerv4/) architecture**, tailored specifically for high-fidelity simulation in the Minecraft environment. Our model utilizes a **MAE (Masked Autoencoder)** for efficient video compression and a **DiT (Diffusion Transformer)** architecture to autoregressively predict future game frames based on history and action inputs in the latent space.
This codebase is streamlined for **deployment and generation**, supporting long-context inference and real-time interaction.
### Key Features
* **Inference Only**: Lightweight codebase focused on generation, stripped of complex training logic.
* **Long Context Support**: Capable of loading Long-Context models to recall events from 12 seconds prior.
* **Fast Inference Backend**: Built-in optimized inference pipeline designed for high-performance, real-time next-frame prediction.
* **Infinite Generation**: Supports infinite generation without image quality degradation during long-term rollouts.
* **Complex Interaction**: Supports a variety of interactions within the Minecraft world, such as eating food, collecting water, using weapons, etc.
## ๐ฐ Model Zoo
Please download the pre-trained weights and place them in the `checkpoints/` directory before running the code.
| Model Name | Params | VRAM Req | Description |
| :--- | :---: | :---: | :--- |
| **MAE-Tokenizer** | 430M | >2GB | Handles video encoding and decoding. |
| **Dynamic Model** | 1.7B | 9GB | Generates the next frame based on history and action. |
> ๐ **Download**: [HuggingFace Collection](https://huggingface.co/your-username/minecraft-world-model)
## ๐ ๏ธ Installation
We recommend using Python 3.10+ and CUDA 12.1+.
```bash
# 1. Clone the repository
git clone https://github.com/IamCreateAI/Dreamerv4-MC.git
cd Dreamerv4-MC
# 2. Create a virtual environment
conda create -n dreamer python=3.12 -y
conda activate dreamer
# 3. Install PyTorch (Adjust index-url for your CUDA version)
pip install torch torchvision --index-url [https://download.pytorch.org/whl/cu121](https://download.pytorch.org/whl/cu121)
# 4. Install dependencies
pip install -r requirements.txt
MAX_JOBS=4 pip install flash-attn --no-build-isolation
pip install -e .
```
## ๐ป Quick-Start
```bash
python ui/inference_ui.py --dynamic_path=/path/to/dynamic_model \
--tokenizer_path=/path/to/tokenizer/ \
--record_video_output_path=output/
```
## ๐ฎ Controls
| Key | Action |
| :--- | :--- |
| **W / A / S / D** | Move |
| **Space** | Jump |
| **Left Click** | Attack / Destroy |
| **Right Click** | Place / Use Item |
| **E** | Open/Close Inventory (Simulation) |
| **1 - 9** | Select Hotbar Slot |
| **R** | start/stop record the video|
| **V** | refresh into new scene |
| **left Shift** |Sneak|
| **left ctrl** |Sprint|
## ๐ Citation
If you use this codebase in your research, please consider citing us as:
```bash
@article{hafner2025dreamerv4,
title = {Dreamer-MC: A Real-Time Autoregressive World Model for Infinite Video Generation},
author = {Ming Gao, Yan Yan, ShengQu Xi, Yu Duan, ShengQian Li, Feng Wang},
year = {2026},
url = {https://findlamp.github.io/dreamer-mc.github.io/}
}
```
as well as the original Dreamer 4 paper:
```bash
@misc{Hafner2025TrainingAgents,
title={Training Agents Inside of Scalable World Models},
author={Danijar Hafner and Wilson Yan and Timothy Lillicrap},
year={2025},
eprint={2509.24527},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2509.24527},
}
```
## ๐ References
This project is built upon the following foundational works:
* **MaeTok**: [Masked Autoencoders Are Effective Tokenizers for Diffusion Models](https://arxiv.org/abs/2502.03444) (Chen et al., ICML 2025)
* **DreamerV4**: [Training Agents Inside of Scalable World Models](https://danijar.com/project/dreamerv4/) (Hafner et al., 2025)
* **CausVid**: [From Slow Bidirectional to Fast Autoregressive Video Diffusion Models](https://arxiv.org/abs/2412.07772) (Yin et al., CVPR 2025)
|