license: mit
language:
- en
pipeline_tag: image-to-video
library_name: diffusers
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
π Overview
Matrix-Game-2.0οΌ1.8BοΌ is an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion
β¨ Key Features
- π Feature 1: Real-Time Distillation Efficient ββfew-step diffusionββ for streaming video synthesis at ββ25 FPSββ, producing ββminute-level, high-fidelity videosββ across complex environments with ultra-fast speed.
- π±οΈ Feature 2: Precise Action Injection A ββmouse/keyboard-to-frameββ module that embeds user inputs as direct interactions, enabling frame-level control and dynamic response in generated videos.
- π¬ Feature 3: Massive Interactive Data Pipeline A scalable production system for ββUnreal Engine & GTA5ββ that generates ββ~1350 hoursββ of high-quality interactive video data, covering diverse scenes with frame-level realism.
π₯ Latest Updates
- [2025-08] π Initial release of Matrix-Game-2.0 Model
π Performance Comparison
GameWorld Score Benchmark Comparison
| Model | Image Quality β | Aesthetic Quality β | Temporal Cons. β | Motion Smooth. β | Keyboard Acc. β | Mouse Acc. β | Object Cons. | Scenario Cons. |
|---|---|---|---|---|---|---|---|---|
| Oasis | 0.27 | 0.27 | 0.82 | 0.99 | 0.73 | 0.56 | 0.18 | 0.84 |
| Ours | 0.61 | 0.50 | 0.94 | 0.98 | 0.91 | 0.95 | 0.64 | 0.80 |
Metric Descriptions:
Image Quality / Aesthetic: Visual fidelity and perceptual appeal of generated frames
Temporal Consistency / Motion Smoothness: Temporal coherence and smoothness between frames
Keyboard Accuracy / Mouse Accuracy: Accuracy in following user control signals
Object Consistency: Geometric stability and consistency of objects over time
Scenario Consistency: Scenario consistency over time
Please check our GameWorld benchmark for detailed implementation.
π Quick Start
# clone the repository:
git clone xxx
cd Matrix-Game-2.0
# install dependencies:
pip install -r requirements.txt
# inference
bash xxx.sh
β Acknowledgements
We would like to express our gratitude to:
- Diffusers for their excellent diffusion model framework
- SkyReels-V2 for their strong base model
- Self-Forcing for their excellent work
- MineRL for their excellent gym framework
- Video-Pre-Training for their accurate Inverse Dynamics Model
- GameFactory for their idea of action control module
We are grateful to the broader research community for their open exploration and contributions to the field of interactive world generation.
π Citation
If you find this project useful, please cite our paper:
xxx