license: mit
language:
- en
pipeline_tag: image-to-video
library_name: diffusers
base_model:
- Skywork/SkyReels-V2-I2V-1.3B-540P
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
๐ Overview
Matrix-Game-2.0๏ผ1.8B๏ผ is an interactive world model generates long videos on-the-fly via few-step auto-regressive diffusion
โจ Key Features
- ๐ Feature 1: Real-Time Distillation Efficient โโfew-step diffusionโโ for streaming video synthesis at โโ25 FPSโโ, producing โโminute-level, high-fidelity videosโโ across complex environments with ultra-fast speed.
- ๐ฑ๏ธ Feature 2: Precise Action Injection A โโmouse/keyboard-to-frameโโ module that embeds user inputs as direct interactions, enabling frame-level control and dynamic response in generated videos.
- ๐ฌ Feature 3: Massive Interactive Data Pipeline A scalable production system for โโUnreal Engine & GTA5โโ that generates โโ~1350 hoursโโ of high-quality interactive video data, covering diverse scenes with frame-level realism.
๐ฅ Latest Updates
- [2025-08] ๐ Initial release of Matrix-Game-2.0 Model
Model Overview
Matrix-Game-2.0๏ผ1.8B๏ผ is derived from the Wan. By removing the text branch and adding action modules, the model predicts next frames only from visual contents and corresponding actions.
๐ Performance Comparison
GameWorld Score Benchmark Comparison
| Model | Image Quality โ | Aesthetic Quality โ | Temporal Cons. โ | Motion Smooth. โ | Keyboard Acc. โ | Mouse Acc. โ | Object Cons. | Scenario Cons. |
|---|---|---|---|---|---|---|---|---|
| Oasis | 0.27 | 0.27 | 0.82 | 0.99 | 0.73 | 0.56 | 0.18 | 0.84 |
| Ours | 0.61 | 0.50 | 0.94 | 0.98 | 0.91 | 0.95 | 0.64 | 0.80 |
Metric Descriptions:
Image Quality / Aesthetic: Visual fidelity and perceptual appeal of generated frames
Temporal Consistency / Motion Smoothness: Temporal coherence and smoothness between frames
Keyboard Accuracy / Mouse Accuracy: Accuracy in following user control signals
Object Consistency: Geometric stability and consistency of objects over time
Scenario Consistency: Scenario consistency over time
Please check our GameWorld benchmark for detailed implementation.
๐ Quick Start
# clone the repository:
git clone https://github.com/SkyworkAI/Matrix-Game-2.0
cd Matrix-Game-2.0
# install dependencies:
pip install -r requirements.txt
python setup.py develop
# inference
python inference.py \
--config_path configs/inference_yaml/{your-config}.yaml \
--checkpoint_path {path-to-the-checkpoint} \
--img_path {path-to-the-input-image} \
--output_folder outputs \
--num_output_frames 150 \
--seed 42 \
--pretrained_model_path {path-to-the-vae-folder}
# inference streaming
python inference_streaming.py \
--config_path configs/inference_yaml/{your-config}.yaml \
--checkpoint_path {path-to-the-checkpoint} \
--output_folder outputs \
--seed 42 \
--pretrained_model_path {path-to-the-vae-folder}
โญ Acknowledgements
We would like to express our gratitude to:
- Diffusers for their excellent diffusion model framework
- SkyReels-V2 for their strong base model
- Self-Forcing for their excellent work
- MineRL for their excellent gym framework
- Video-Pre-Training for their accurate Inverse Dynamics Model
- GameFactory for their idea of action control module
We are grateful to the broader research community for their open exploration and contributions to the field of interactive world generation.
๐ Citation
If you find this project useful, please cite our paper:
xxx
