liuuzexiang commited on
Commit
f144a15
·
verified ·
1 Parent(s): 1403f6a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -25,6 +25,12 @@ library_name: diffusers
25
  ## 📝 Overview
26
  **Matrix-Game-3.0** is an open-sourced, memory-augmented interactive world model designed for 720p real-time long-form video generation.
27
 
 
 
 
 
 
 
28
  ## ✨ Key Features
29
  - 🚀 **Feature 1**: **Upgraded Data Engine**: Combines Unreal Engine-based synthetic data, large-scale automated AAA game data, and real-world video augmentation to generate high-quality Video–Pose–Action–Prompt data.
30
  - 🖱️ **Feature 2**: **Long-horizon Memory & Consistency**: Uses prediction residuals and frame re-injection for self-correction, while camera-aware memory ensures long-term spatiotemporal consistency.
 
25
  ## 📝 Overview
26
  **Matrix-Game-3.0** is an open-sourced, memory-augmented interactive world model designed for 720p real-time long-form video generation.
27
 
28
+ ## Framework Overview
29
+ Our framework unifies three stages into an end-to-end pipeline:
30
+ - Data Engine — an industrial-scale infinite data engine integrating Unreal Engine synthetic scenes, large-scale automated AAA game collection,and real-world video augmentation to produce high-quality Video-Pose-Action-Prompt quadruplets at scale;
31
+ - Model Training — a memory-augmented Diffusion Transformer (DiT) with an error buffer that learns action-conditioned generation with memory-enhanced long-horizon consistency;
32
+ - Inference Deployment — few-step sampling, INT8 quantization, and model distillation achieving 720p@40FPS real-time generation with a 5B model.
33
+
34
  ## ✨ Key Features
35
  - 🚀 **Feature 1**: **Upgraded Data Engine**: Combines Unreal Engine-based synthetic data, large-scale automated AAA game data, and real-world video augmentation to generate high-quality Video–Pose–Action–Prompt data.
36
  - 🖱️ **Feature 2**: **Long-horizon Memory & Consistency**: Uses prediction residuals and frame re-injection for self-correction, while camera-aware memory ensures long-term spatiotemporal consistency.