| # Matrix-Game 2.0 WebSocket Server | |
| ## Project Overview | |
| Matrix-Game 2.0 is a real-time interactive game world generation system that uses advanced generative video models to create explorable environments. This repository contains a WebSocket server wrapper that enables web-based interaction with the Matrix-Game 2.0 models. | |
| ## Architecture | |
| ### Core Components | |
| 1. **api_server.py** - WebSocket server handling client connections and game sessions | |
| 2. **api_engine.py** - Matrix-Game 2.0 model inference engine | |
| 3. **api_utils.py** - Utility functions for image processing and visualization | |
| 4. **client/** - Web-based client interface for testing | |
| ### Model Components | |
| - **WAN Diffusion Model** - Core generative model (14B parameters) | |
| - **VAE Encoder/Decoder** - For latent space encoding/decoding | |
| - **Streaming Pipeline** - Real-time frame generation | |
| - **Condition Processing** - Keyboard and mouse input handling | |
| ## Key Features | |
| - Real-time video generation based on user inputs | |
| - Multiple game modes: Universal, GTA Drive, Temple Run | |
| - WebSocket-based streaming for low-latency interaction | |
| - Fallback mode for demo without GPU | |
| - Support for multiple concurrent sessions | |
| ## Resolution and Performance | |
| - Standard resolution: 352x640 | |
| - Target FPS: 16 | |
| - Streaming generation: 5 frames per batch | |
| - Reduced latency through latent-space operations | |
| ## Game Modes | |
| 1. **Universal** - General exploration with full camera and movement control | |
| 2. **GTA Drive** - Driving simulation mode | |
| 3. **Temple Run** - Runner game mode with limited controls | |
| ## Input Controls | |
| ### Keyboard Controls | |
| - W/S/A/D - Movement (forward/back/left/right) | |
| - Space - Jump | |
| - Shift/Ctrl - Attack/Action | |
| ### Mouse Controls | |
| - X/Y coordinates normalized to [-1, 1] | |
| - Camera rotation and view control | |
| ## Model Loading | |
| The system automatically downloads models from Hugging Face (Skywork/Matrix-Game-2.0) if not present locally. Models include: | |
| - Wan2.1_VAE.pth - VAE model weights | |
| - Generator checkpoint files | |
| - Configuration files for different modes | |
| ## Deployment | |
| ### Docker Deployment | |
| ```bash | |
| docker build -t matrix-game-2 . | |
| docker run -p 8080:8080 --gpus all matrix-game-2 | |
| ``` | |
| ### Local Development | |
| ```bash | |
| pip install -r requirements.txt | |
| python api_server.py --host 0.0.0.0 --port 8080 | |
| ``` | |
| ## Environment Variables | |
| - `PORT` - Server port (default: 8080) | |
| - `SPACE_ID` - Hugging Face Space ID (for HF deployment) | |
| - `CUDA_VISIBLE_DEVICES` - GPU selection | |
| ## Testing | |
| Access the web client at `http://localhost:8080/` after starting the server. | |
| ## Known Limitations | |
| - Requires NVIDIA GPU with 24GB+ VRAM for full model | |
| - Initial model loading takes 2-3 minutes | |
| ## Updates from V1 | |
| - New model architecture (WAN-based instead of DIT-based) | |
| - Streaming pipeline for better real-time performance | |
| - Improved condition handling for different game modes | |
| - Better memory efficiency through tiling | |
| - Simplified API structure |