Update README.md
Browse files
README.md
CHANGED
|
@@ -12,10 +12,10 @@ library_name: diffusers
|
|
| 12 |
<a href="https://github.com/SkyworkAI/Matrix-Game">
|
| 13 |
<img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
|
| 14 |
</a>
|
| 15 |
-
<a href="https://
|
| 16 |
<img src="https://img.shields.io/badge/arXiv-Report-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="report">
|
| 17 |
</a>
|
| 18 |
-
|
| 19 |
<img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
|
| 20 |
</a>
|
| 21 |
|
|
@@ -29,4 +29,56 @@ library_name: diffusers
|
|
| 29 |
- 🚀 **Feature 1**: **Upgraded Data Engine**: Combines Unreal Engine-based synthetic data, large-scale automated AAA game data, and real-world video augmentation to generate high-quality Video–Pose–Action–Prompt data.
|
| 30 |
- 🖱️ **Feature 2**: **Long-horizon Memory & Consistency**: Uses prediction residuals and frame re-injection for self-correction, while camera-aware memory ensures long-term spatiotemporal consistency.
|
| 31 |
- 🎬 **Feature 3**: **Real-Time Interactivity & Open Access**: It employs a multi-segment autoregressive distillation strategy based on Distribution Matching Distillation (DMD), combined with model quantization and VAE decoder distillation to support [40fps] real-time generation at 720p resolution with a 5B model, while maintaining stable memory consistency over minute-long sequence.
|
| 32 |
-
- 👍 **Feature 3**: **Scale Up 28B-MoE Model**: Scaling up to a 2×14B model further improves generation quality, dynamics, and generalization.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
<a href="https://github.com/SkyworkAI/Matrix-Game">
|
| 13 |
<img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
|
| 14 |
</a>
|
| 15 |
+
<a href="https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf">
|
| 16 |
<img src="https://img.shields.io/badge/arXiv-Report-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="report">
|
| 17 |
</a>
|
| 18 |
+
<a href="https://matrix-game-v2.github.io/">
|
| 19 |
<img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
|
| 20 |
</a>
|
| 21 |
|
|
|
|
| 29 |
- 🚀 **Feature 1**: **Upgraded Data Engine**: Combines Unreal Engine-based synthetic data, large-scale automated AAA game data, and real-world video augmentation to generate high-quality Video–Pose–Action–Prompt data.
|
| 30 |
- 🖱️ **Feature 2**: **Long-horizon Memory & Consistency**: Uses prediction residuals and frame re-injection for self-correction, while camera-aware memory ensures long-term spatiotemporal consistency.
|
| 31 |
- 🎬 **Feature 3**: **Real-Time Interactivity & Open Access**: It employs a multi-segment autoregressive distillation strategy based on Distribution Matching Distillation (DMD), combined with model quantization and VAE decoder distillation to support [40fps] real-time generation at 720p resolution with a 5B model, while maintaining stable memory consistency over minute-long sequence.
|
| 32 |
+
- 👍 **Feature 3**: **Scale Up 28B-MoE Model**: Scaling up to a 2×14B model further improves generation quality, dynamics, and generalization.
|
| 33 |
+
|
| 34 |
+
## 🔥 Latest Updates
|
| 35 |
+
|
| 36 |
+
* [2026-03] 🎉 Initial release of Matrix-Game-3.0 Model
|
| 37 |
+
|
| 38 |
+
## 🚀 Quick Start
|
| 39 |
+
### Installation
|
| 40 |
+
Create a conda environment and install dependencies:
|
| 41 |
+
```
|
| 42 |
+
conda create -n matrix-game-3.0 python=3.12 -y
|
| 43 |
+
conda activate matrix-game-3.0
|
| 44 |
+
# install FlashAttention
|
| 45 |
+
# Our project also depends on [FlashAttention](https://github.com/Dao-AILab/flash-attention)
|
| 46 |
+
git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
|
| 47 |
+
cd Matrix-Game-3.0
|
| 48 |
+
pip install -r requirements.txt
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
### Model Download
|
| 52 |
+
```
|
| 53 |
+
pip install "huggingface_hub[cli]"
|
| 54 |
+
huggingface-cli download Matrix-Game-3.0 --local-dir Matrix-Game-3.0
|
| 55 |
+
```
|
| 56 |
+
### Inference
|
| 57 |
+
Before running inference, you need to prepare:
|
| 58 |
+
- Input image
|
| 59 |
+
- Text prompt
|
| 60 |
+
|
| 61 |
+
After downloading pretrained models, you can use the following command to generate an interactive video with random actions:
|
| 62 |
+
``` sh
|
| 63 |
+
torchrun --nproc_per_node=$NUM_GPUS generate.py --size 704*1280 --dit_fsdp --t5_fsdp --ckpt_dir Matrix-Game-3.0 --fa_version 3 --use_int8 --num_iterations 12 --num_inference_steps 3 --image demo_images/000/image.png --prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." --save_name test --seed 42 --compile_vae --lightvae_pruning_rate 0.5 --vae_type mg_lightvae --output_dir ./output
|
| 64 |
+
# "num_iterations" refers to the number of iterations you want to generate. The total number of frames generated is given by:57 + (num_iterations - 1) * 40
|
| 65 |
+
```
|
| 66 |
+
Tips:
|
| 67 |
+
If you want to use the base model, you can use "--use_base_model --num_inference_steps 50". Otherwise if you want to generating the interactive videos with your own input actions, you can use "--interactive".
|
| 68 |
+
With multiple GPUs, you can pass `--use_async_vae --async_vae_warmup_iters 1` to speed up inference (see [`test.sh`](test.sh)).
|
| 69 |
+
|
| 70 |
+
## ⭐ Acknowledgements
|
| 71 |
+
- [Diffusers](https://github.com/huggingface/diffusers) for their excellent diffusion model framework
|
| 72 |
+
- [Self-Forcing](https://github.com/guandeh17/Self-Forcing) for their excellent work
|
| 73 |
+
- [GameFactory](https://github.com/KwaiVGI/GameFactory) for their idea of action control module
|
| 74 |
+
- [LightX2V](https://github.com/ModelTC/lightx2v) for their excellent quantization framework
|
| 75 |
+
- [Wan2.2](https://github.com/Wan-Video/Wan2.2) for their strong base model
|
| 76 |
+
- [lingbot-world](https://github.com/Robbyant/lingbot-world) for their context parallel framework
|
| 77 |
+
## 📜 License
|
| 78 |
+
This project is licensed under the Apache License, Version 2.0 — see [LICENSE.txt](LICENSE.txt).
|
| 79 |
+
|
| 80 |
+
## 📖 Citation
|
| 81 |
+
If you find this work useful for your research, please kindly cite our paper:
|
| 82 |
+
|
| 83 |
+
```
|
| 84 |
+
```
|