brycebywang commited on
Commit
11d96b0
Β·
verified Β·
1 Parent(s): d535b99

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -104
README.md CHANGED
@@ -7,24 +7,30 @@
7
  <a href="https://github.com/SkyworkAI/Matrix-Game">
8
  <img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
9
  </a>
10
- <a href="#todo">
11
  <img src="https://img.shields.io/badge/arXiv-Report-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="arXiv">
12
  </a>
 
 
13
  </div>
14
 
15
  ## πŸ“ Overview
16
- **Matrix-Game** is a 17B-parameter Diffusion Transformer for generating high-resolution, physics-consistent videos in interactive game environments. Trained on large-scale data from Minecraft and Unreal Engine, it understands game physics like collisions, destruction, and item placement. Matrix-Game supports real-time, action-conditioned generation, adapting video content dynamically to user input.
 
 
17
 
18
- You can find more visualizations on our [website](#).
 
 
19
 
20
  ## πŸ”₯ Latest Updates
21
 
22
- * [2025-05] πŸŽ‰ Initial release of Matrix-Game
23
 
24
  ## πŸš€ Performance Comparison
25
  ### GameWorld Score Benchmark Comparison
26
 
27
- | Model | Image Quality ↑ | Aesthetic ↑ | Temporal Cons. ↑ | Motion Smooth. ↑ | Keyboard Acc. ↑ | Mouse Acc. ↑ | 3D Cons. ↑ |
28
  |-----------|------------------|-------------|-------------------|-------------------|------------------|---------------|-------------|
29
  | Oasis | 0.65 | 0.48 | 0.94 | **0.98** | 0.77 | 0.56 | 0.56 |
30
  | MineWorld | 0.69 | 0.47 | 0.95 | **0.98** | 0.86 | 0.64 | 0.51 |
@@ -33,128 +39,57 @@ You can find more visualizations on our [website](#).
33
  **Metric Descriptions**:
34
 
35
  - **Image Quality** / **Aesthetic**: Visual fidelity and perceptual appeal of generated frames
36
- - **Temporal Cons.** / **Motion Smooth.**: Temporal coherence and smoothness between frames
37
- - **Keyboard Acc.** / **Mouse Acc.**: Accuracy in following user control signals
38
- - **3D Cons.**: Geometric stability and physical plausibility over time
 
 
39
 
40
  ### Human Evaluation
41
- <table>
42
- <thead>
43
- <tr>
44
- <th>Group</th>
45
- <th>Method</th>
46
- <th>Overall Quality (%)</th>
47
- <th>Controllability (%)</th>
48
- <th>Visual Quality (%)</th>
49
- <th>Temporal Consistency (%)</th>
50
- </tr>
51
- </thead>
52
- <tbody>
53
- <tr>
54
- <td rowspan="3">Group A</td>
55
- <td>Oasis</td>
56
- <td>0.16</td>
57
- <td>0.33</td>
58
- <td>0.00</td>
59
- <td>0.16</td>
60
- </tr>
61
- <tr>
62
- <td>MineWorld</td>
63
- <td>3.78</td>
64
- <td>5.58</td>
65
- <td>1.32</td>
66
- <td>13.82</td>
67
- </tr>
68
- <tr>
69
- <td><strong>Ours</strong></td>
70
- <td><strong>96.05</strong></td>
71
- <td><strong>94.09</strong></td>
72
- <td><strong>98.68</strong></td>
73
- <td><strong>86.02</strong></td>
74
- </tr>
75
- <tr>
76
- <td rowspan="3">Group B</td>
77
- <td>Oasis</td>
78
- <td>0.66</td>
79
- <td>0.82</td>
80
- <td>0.75</td>
81
- <td>0.66</td>
82
- </tr>
83
- <tr>
84
- <td>MineWorld</td>
85
- <td>2.79</td>
86
- <td>5.76</td>
87
- <td>1.48</td>
88
- <td>6.25</td>
89
- </tr>
90
- <tr>
91
- <td><strong>Ours</strong></td>
92
- <td><strong>96.55</strong></td>
93
- <td><strong>93.42</strong></td>
94
- <td><strong>97.77</strong></td>
95
- <td><strong>93.09</strong></td>
96
- </tr>
97
- <tr>
98
- <td rowspan="3">Average</td>
99
- <td>Oasis</td>
100
- <td>0.41</td>
101
- <td>0.58</td>
102
- <td>0.38</td>
103
- <td>0.41</td>
104
- </tr>
105
- <tr>
106
- <td>MineWorld</td>
107
- <td>3.29</td>
108
- <td>5.67</td>
109
- <td>1.40</td>
110
- <td>10.04</td>
111
- </tr>
112
- <tr>
113
- <td><strong>Ours</strong></td>
114
- <td><strong>96.30</strong></td>
115
- <td><strong>93.76</strong></td>
116
- <td><strong>98.23</strong></td>
117
- <td><strong>89.56</strong></td>
118
- </tr>
119
- </tbody>
120
- </table>
121
 
122
  > Double-blind human evaluation by two independent groups across four key dimensions: **Overall Quality**, **Controllability**, **Visual Quality**, and **Temporal Consistency**.
123
  > Scores represent the percentage of pairwise comparisons in which each method was preferred. Matrix-Game consistently outperforms prior models across all metrics and both groups.
124
 
125
 
126
- ## πŸ› οΈ Installation
127
 
128
- 1. Clone the repository:
129
- ```bash
130
  git clone https://github.com/SkyworkAI/Matrix-Game.git
131
  cd Matrix-Game
132
- ```
133
 
134
- 2. Install dependencies:
135
- ```bash
136
  pip install -r requirements.txt
137
- ```
138
 
139
- ## πŸš€ Quick Start
 
140
 
141
- ```bash
142
  bash run_inference.sh
143
  ```
144
 
145
- ## 🀝 Contributing
146
-
147
- We welcome contributions! Please see our [contributing guidelines](CONTRIBUTING.md) for more details.
148
-
149
  ## ⭐ Acknowledgements
150
 
151
  We would like to express our gratitude to:
152
 
153
  - [Diffusers](https://github.com/huggingface/diffusers) for their excellent diffusion model framework
154
  - [HunyuanVideo](https://github.com/Tencent/HunyuanVideo) for their strong base model
 
 
 
 
155
 
156
  We are grateful to the broader research community for their open exploration and contributions to the field of interactive world generation.
157
 
158
- ## πŸ“„ License
159
-
160
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
 
 
 
 
 
 
 
 
7
  <a href="https://github.com/SkyworkAI/Matrix-Game">
8
  <img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
9
  </a>
10
+ <a href="https://github.com/SkyworkAI/Matrix-Game/blob/main/assets/report.pdf">
11
  <img src="https://img.shields.io/badge/arXiv-Report-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="arXiv">
12
  </a>
13
+
14
+
15
  </div>
16
 
17
  ## πŸ“ Overview
18
+ **Matrix-Game** is a 17B-parameter interactive world foundation model for controllable game world generation.
19
+
20
+ ## ✨ Key Features
21
 
22
+ - 🎯 **Feature 1**: **Interactive Generation.** A diffusion-based image-to-world model that generates high-quality videos conditioned on keyboard and mouse inputs, enabling fine-grained control and dynamic scene evolution.
23
+ - πŸš€ **Feature 2**: **GameWorld Score.** A comprehensive benchmark for evaluating Minecraft world models across four key dimensions, including visual quality, temporal quality, action controllability, and physical rule understanding.
24
+ - πŸ’‘ **Feature 3**: **Matrix-Game Dataset** A large-scale Minecraft dataset with fine-grained action annotations, supporting scalable training for interactive and physically grounded world modeling.
25
 
26
  ## πŸ”₯ Latest Updates
27
 
28
+ * [2025-05] πŸŽ‰ Initial release of Matrix-Game Model
29
 
30
  ## πŸš€ Performance Comparison
31
  ### GameWorld Score Benchmark Comparison
32
 
33
+ | Model | Image Quality ↑ | Aesthetic Quality ↑ | Temporal Cons. ↑ | Motion Smooth. ↑ | Keyboard Acc. ↑ | Mouse Acc. ↑ | 3D Cons. ↑ |
34
  |-----------|------------------|-------------|-------------------|-------------------|------------------|---------------|-------------|
35
  | Oasis | 0.65 | 0.48 | 0.94 | **0.98** | 0.77 | 0.56 | 0.56 |
36
  | MineWorld | 0.69 | 0.47 | 0.95 | **0.98** | 0.86 | 0.64 | 0.51 |
 
39
  **Metric Descriptions**:
40
 
41
  - **Image Quality** / **Aesthetic**: Visual fidelity and perceptual appeal of generated frames
42
+ - **Temporal Consistency** / **Motion Smoothness**: Temporal coherence and smoothness between frames
43
+ - **Keyboard Accuracy** / **Mouse Accuracy**: Accuracy in following user control signals
44
+ - **3D Consistency**: Geometric stability and physical plausibility over time
45
+
46
+ Please check our [GameWorld](https://github.com/SkyworkAI/Matrix-Game/tree/main/GameWorldScore) benchmark for detailed implementation.
47
 
48
  ### Human Evaluation
49
+
50
+ ![Human Win Rate](https://raw.githubusercontent.com/SkyworkAI/Matrix-Game/main/assets/imgs/human_win_rate.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
  > Double-blind human evaluation by two independent groups across four key dimensions: **Overall Quality**, **Controllability**, **Visual Quality**, and **Temporal Consistency**.
53
  > Scores represent the percentage of pairwise comparisons in which each method was preferred. Matrix-Game consistently outperforms prior models across all metrics and both groups.
54
 
55
 
56
+ ## πŸš€ Quick Start
57
 
58
+ ```
59
+ # clone the repository:
60
  git clone https://github.com/SkyworkAI/Matrix-Game.git
61
  cd Matrix-Game
 
62
 
63
+ # install dependencies:
 
64
  pip install -r requirements.txt
 
65
 
66
+ # install apex and FlashAttention-3
67
+ # Our project also depends on [apex](https://github.com/NVIDIA/apex) and [FlashAttention-3](https://github.com/Dao-AILab/flash-attention)
68
 
69
+ # inference
70
  bash run_inference.sh
71
  ```
72
 
 
 
 
 
73
  ## ⭐ Acknowledgements
74
 
75
  We would like to express our gratitude to:
76
 
77
  - [Diffusers](https://github.com/huggingface/diffusers) for their excellent diffusion model framework
78
  - [HunyuanVideo](https://github.com/Tencent/HunyuanVideo) for their strong base model
79
+ - [MineDojo](https://minedojo.org/knowledge_base) for their Minecraft video dataset
80
+ - [MineRL](https://github.com/minerllabs/minerl) for their excellent gym framework
81
+ - [Video-Pre-Training](https://github.com/openai/Video-Pre-Training) for their accurate Inverse Dynamics Model
82
+ - [GameFactory](https://github.com/KwaiVGI/GameFactory) for their idea of action control module
83
 
84
  We are grateful to the broader research community for their open exploration and contributions to the field of interactive world generation.
85
 
86
+ ## πŸ“Ž Citation
87
+ If you find this project useful, please cite our paper:
88
+ ```bibtex
89
+ @article{zhang2025matrixgame,
90
+ title = {Matrix-Game: Interactive World Foundation Model},
91
+ author = {Yifan Zhang and Chunli Peng and Boyang Wang and Puyi Wang and Qingcheng Zhu and Zedong Gao and Eric Li and Yang Liu and Yahui Zhou},
92
+ journal = {arXiv},
93
+ year = {2025}
94
+ }
95
+ ```