Matrix-Game 2.0 WebSocket Server
Project Overview
Matrix-Game 2.0 is a real-time interactive game world generation system that uses advanced generative video models to create explorable environments. This repository contains a WebSocket server wrapper that enables web-based interaction with the Matrix-Game 2.0 models.
Architecture
Core Components
- api_server.py - WebSocket server handling client connections and game sessions
- api_engine.py - Matrix-Game 2.0 model inference engine
- api_utils.py - Utility functions for image processing and visualization
- client/ - Web-based client interface for testing
Model Components
- WAN Diffusion Model - Core generative model (14B parameters)
- VAE Encoder/Decoder - For latent space encoding/decoding
- Streaming Pipeline - Real-time frame generation
- Condition Processing - Keyboard and mouse input handling
Key Features
- Real-time video generation based on user inputs
- Multiple game modes: Universal, GTA Drive, Temple Run
- WebSocket-based streaming for low-latency interaction
- Fallback mode for demo without GPU
- Support for multiple concurrent sessions
Resolution and Performance
- Standard resolution: 352x640
- Target FPS: 16
- Streaming generation: 5 frames per batch
- Reduced latency through latent-space operations
Game Modes
- Universal - General exploration with full camera and movement control
- GTA Drive - Driving simulation mode
- Temple Run - Runner game mode with limited controls
Input Controls
Keyboard Controls
- W/S/A/D - Movement (forward/back/left/right)
- Space - Jump
- Shift/Ctrl - Attack/Action
Mouse Controls
- X/Y coordinates normalized to [-1, 1]
- Camera rotation and view control
Model Loading
The system automatically downloads models from Hugging Face (Skywork/Matrix-Game-2.0) if not present locally. Models include:
- Wan2.1_VAE.pth - VAE model weights
- Generator checkpoint files
- Configuration files for different modes
Deployment
Docker Deployment
docker build -t matrix-game-2 .
docker run -p 8080:8080 --gpus all matrix-game-2
Local Development
pip install -r requirements.txt
python api_server.py --host 0.0.0.0 --port 8080
Environment Variables
PORT- Server port (default: 8080)SPACE_ID- Hugging Face Space ID (for HF deployment)CUDA_VISIBLE_DEVICES- GPU selection
Testing
Access the web client at http://localhost:8080/ after starting the server.
Known Limitations
- Requires NVIDIA GPU with 24GB+ VRAM for full model
- Initial model loading takes 2-3 minutes
Updates from V1
- New model architecture (WAN-based instead of DIT-based)
- Streaming pipeline for better real-time performance
- Improved condition handling for different game modes
- Better memory efficiency through tiling
- Simplified API structure