Spaces:

anntnikita
/

matrix-game-demo

Runtime error

App Files Files Community

anntnikita commited on Aug 16

Commit

41fb5d9

verified ·

1 Parent(s): d43bea6

Add Matrix Game app, requirements, and updated README

Browse files

Files changed (3) hide show

README.md +74 -14
app.py +248 -0
requirements.txt +10 -0

README.md CHANGED Viewed

@@ -1,14 +1,74 @@
----
-title: Matrix Game Demo
-emoji: ⚡
-colorFrom: yellow
-colorTo: blue
-sdk: gradio
-sdk_version: 5.42.0
-app_file: app.py
-pinned: false
-license: mit
-short_description: Interactive demo for Matrix Game 2.0 model.
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Matrix Game 2.0 Interactive Demo
+This folder contains a minimal but **fully‑working** example for running and interacting with the [Matrix‑Game 2.0](https://huggingface.co/Skywork/Matrix-Game-2.0) model.  The goal of this demo is to expose the core mechanics of the model—turning a single input image and a sequence of user actions into a short video—behind a simple web interface.
+> **❗️Hardware requirements**
+Matrix‑Game 2.0 is a very large model (over 1.8 B parameters) and was designed to run on datacenter GPUs like NVIDIA A100 or H100.  You can technically run it on a consumer GPU or even CPU, but inference will be extremely slow and may run out of memory.  For the best experience you should launch this demo on a machine with at least 24 GiB of GPU VRAM.  The code will gracefully fall back to CPU execution if no GPU is available, but expect generation to take minutes per frame on a CPU.
+## Setup
+1. Create a fresh Python environment (Python 3.10+ is recommended):
+   ```bash
+   python -m venv .venv
+   source .venv/bin/activate
+   ```
+2. Install the dependencies listed in `requirements.txt`:
+   ```bash
+   pip install --upgrade pip
+   pip install -r requirements.txt
+   ```
+3. Log in to the Hugging Face Hub using your own access token.  You can either export it as an environment variable or pass it directly to the application.  To export it, replace `YOUR_HF_TOKEN` with a valid token generated from your Hugging Face account:
+   ```bash
+   export HF_TOKEN="YOUR_HF_TOKEN"
+   ```
+   Alternatively, run the command interactively:
+   ```bash
+   huggingface-cli login --token YOUR_HF_TOKEN
+   ```
+4. Launch the interactive demo:
+   ```bash
+   python matrix_game_interface.py
+   ```
+   The first time you run the script it will download several gigabytes of model weights from Hugging Face.  Subsequent runs will reuse the cached files.
+## Usage
+Once the Gradio interface starts (usually it will open your browser automatically), follow these steps:
+1. **Select an input image** – this image acts as the first frame of the generated video.  The model expects images with a 16∶9 aspect ratio.  You can upload any photo (for example, a screenshot from a game or a view you want to explore).
+2. **Choose the number of frames** you want to generate.  The demo currently supports up to 30 frames (roughly one second of video at 30 fps).  Longer videos require more memory and compute.
+3. **Click “Generate”**.  The model will synthesize a sequence of frames conditioned on your chosen image.  You can watch the result directly in the browser or download the MP4 file to view it offline.
+### Action control
+Matrix‑Game 2.0 normally accepts keyboard and mouse actions at each time step to steer the camera within the scene.  The simplified interface provided here does not expose those controls directly—primarily because real‑time interaction requires high‑frequency communication with the model that cannot be reliably handled in a browser without significant latency.
+However, the underlying `MatrixGame` class exposes a `generate` method that takes optional `mouse` and `keyboard` tensors (representing camera and movement commands).  Feel free to modify the UI to add your own custom controls if you would like to experiment with the full action‑conditioned generation.
+## Project structure
+```
+matrix_game_app/
+├── README.md               – this file
+├── requirements.txt        – minimal Python dependencies
+└── matrix_game_interface.py – entry point for the interactive demo
+```
+## Notes
+This project is intentionally light‑weight.  It does not attempt to replicate the full training or inference pipeline from the official Matrix‑Game repository.  Instead, it leverages the `diffusers` integration for the model to provide a quick way to run inference.  If you need the full streaming inference pipeline with mouse/keyboard injection (as available in the original repo), please clone [SkyworkAI/Matrix‑Game](https://github.com/SkyworkAI/Matrix-Game) and follow the instructions in its `README.md`.
+Finally, please be aware that the model weights are released under the MIT license.  Make sure you adhere to the license terms when redistributing or using the model.

app.py ADDED Viewed

	@@ -0,0 +1,248 @@

+"""
+matrix_game_interface.py
+========================
+This script exposes a simple web interface for the Matrix‑Game 2.0 model via
+Gradio.  Given an initial image, the model produces a short video that
+continues the scene forward in time.  The code uses the diffusers library to
+download and load the model from Hugging Face.  It automatically selects CPU
+or GPU based on availability.
+To run this script you must have installed the dependencies in
+`requirements.txt` and logged in to the Hugging Face Hub using your access
+token.  You can set the token at runtime via the `HF_TOKEN` environment
+variable or by passing it into the constructor of the `MatrixGame` class.
+Note: generating videos with Matrix‑Game 2.0 is computationally intensive and
+requires a machine with significant memory.  On a CPU the generation may be
+very slow.  For best results use a GPU with at least 24 GiB VRAM.
+"""
+from __future__ import annotations
+import os
+import tempfile
+from typing import List, Optional
+import numpy as np
+from PIL import Image
+import torch
+from huggingface_hub import login
+try:
+    # Import the generic video pipeline loader.  Depending on your version of
+    # diffusers this symbol may live in different modules.  We guard the import
+    # so that the script does not crash at import time on older versions.
+    from diffusers import AutoPipelineForVideo
+except Exception:
+    AutoPipelineForVideo = None  # type: ignore
+try:
+    from diffusers import ImageToVideoPipeline
+except Exception:
+    ImageToVideoPipeline = None  # type: ignore
+try:
+    import gradio as gr
+except Exception:
+    gr = None  # type: ignore
+try:
+    from moviepy.editor import ImageSequenceClip
+except Exception:
+    ImageSequenceClip = None  # type: ignore
+class MatrixGame:
+    """Wrapper around the Matrix‑Game 2.0 model.
+    This class handles logging in to Hugging Face, downloading the model,
+    selecting the appropriate device and performing video generation.  It
+    currently supports the universal mode, which uses the base distilled model
+    weights.  Real‑time interactive control with mouse and keyboard inputs is
+    possible but not exposed through the Gradio UI.
+    """
+    MODEL_ID: str = "Skywork/Matrix-Game-2.0"
+    def __init__(self, hf_token: Optional[str] = None, *, mode: str = "universal"):
+        self.mode = mode
+        self.hf_token = hf_token or os.environ.get("HF_TOKEN")
+        if not self.hf_token:
+            raise ValueError(
+                "A HuggingFace token must be provided either via the HF_TOKEN "
+                "environment variable or the hf_token argument."
+            )
+        # Authenticate with Hugging Face.  This call is idempotent; if you're
+        # already logged in it does nothing.
+        login(token=self.hf_token, add_to_git_credential=False)
+        # Select compute device.  Use GPU if available; otherwise fall back to CPU.
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        # Use lower‑precision dtypes on GPU to save memory.
+        if self.device.type == "cuda":
+            self.dtype = torch.float16
+        else:
+            self.dtype = torch.float32
+        # Load the pipeline.  We try the new `AutoPipelineForVideo` first since it
+        # automatically selects the proper class based on the model's
+        # configuration.  If that is unavailable we fall back to
+        # `ImageToVideoPipeline`, which is supported by diffusers >=0.25.0.
+        pipeline = None
+        if AutoPipelineForVideo is not None:
+            try:
+                pipeline = AutoPipelineForVideo.from_pretrained(
+                    self.MODEL_ID,
+                    torch_dtype=self.dtype,
+                    variant="fp16" if self.dtype == torch.float16 else None,
+                    use_auth_token=self.hf_token,
+                )
+            except Exception as e:
+                print(f"AutoPipelineForVideo failed to load: {e}")
+        if pipeline is None and ImageToVideoPipeline is not None:
+            try:
+                pipeline = ImageToVideoPipeline.from_pretrained(
+                    self.MODEL_ID,
+                    torch_dtype=self.dtype,
+                    variant="fp16" if self.dtype == torch.float16 else None,
+                    use_auth_token=self.hf_token,
+                )
+            except Exception as e:
+                print(f"ImageToVideoPipeline failed to load: {e}")
+        if pipeline is None:
+            raise RuntimeError(
+                "Could not load a video pipeline for Matrix‑Game 2.0.  Please "
+                "ensure diffusers is up to date (>=0.33) and that you have GPU "
+                "support installed."
+            )
+        self.pipeline = pipeline.to(self.device)
+    def generate_frames(self, image: Image.Image, num_frames: int = 8) -> List[Image.Image]:
+        """Generate a sequence of frames given an initial image.
+        Args:
+            image: A PIL.Image that will act as the first frame of the video.
+            num_frames: The number of frames to generate (including the input).
+        Returns:
+            A list of PIL.Image objects representing the generated video frames.
+        """
+        # Normalize and resize the input image to what the pipeline expects.  The
+        # diffusers pipelines internally handle resizing, but explicitly
+        # converting to RGB ensures consistent results.
+        if not isinstance(image, Image.Image):
+            raise ValueError("Input must be a PIL.Image")
+        image = image.convert("RGB")
+        # Some pipelines support passing `num_frames` directly to control the video
+        # length.  Others may ignore the argument and use a default value.  The
+        # Matrix‑Game model natively produces 16 frames per call.  We allow the
+        # caller to request fewer frames; the pipeline will truncate the result
+        # accordingly.
+        with torch.autocast(self.device.type, dtype=self.dtype):
+            result = self.pipeline(image, num_frames=num_frames)
+        # The result is a simple namespace with a `frames` attribute containing
+        # the frames as PIL images.
+        frames: List[Image.Image] = getattr(result, "frames", None)
+        if frames is None:
+            # Some versions of diffusers return a dictionary with a
+            # "frames" key instead of an attribute.
+            frames = result.get("frames")  # type: ignore
+        if frames is None:
+            raise RuntimeError("Unexpected output format from the pipeline")
+        # Limit to the requested number of frames if more were produced.
+        return frames[: num_frames]
+    def frames_to_video(self, frames: List[Image.Image], fps: int = 15) -> str:
+        """Convert a list of frames into a temporary MP4 file.
+        Args:
+            frames: A list of PIL images.
+            fps: Frames per second for the output video.
+        Returns:
+            The file path to the generated MP4 video.
+        """
+        if ImageSequenceClip is None:
+            raise ImportError(
+                "moviepy is required to assemble videos.  Please install it with "
+                "`pip install moviepy` or use an alternative method."
+            )
+        # Convert PIL images to numpy arrays in uint8 format
+        clips = [np.array(frame) for frame in frames]
+        clip = ImageSequenceClip(clips, fps=fps)
+        # Write to a temporary file
+        tmp_dir = tempfile.mkdtemp(prefix="matrix_game_")
+        video_path = os.path.join(tmp_dir, "output.mp4")
+        clip.write_videofile(video_path, codec="libx264", audio=False, verbose=False, logger=None)
+        return video_path
+def launch_interface():
+    """Launch a Gradio interface for Matrix‑Game 2.0."""
+    if gr is None:
+        raise ImportError(
+            "Gradio is not installed.  Please install it with `pip install gradio`."
+        )
+    # Instantiate the model wrapper once.  This will download the weights
+    # automatically on first use.  We read the token from the environment; if
+    # you prefer you can hard‑code the token here, but be mindful of
+    # security best practices.
+    hf_token = os.environ.get("HF_TOKEN")
+    if not hf_token:
+        raise RuntimeError(
+            "Please set the HF_TOKEN environment variable to your HuggingFace access "
+            "token before launching the interface."
+        )
+    matrix_game = MatrixGame(hf_token=hf_token)
+    def generate_fn(image: Image.Image, num_frames: int) -> str:
+        """Callback invoked by Gradio to generate a video file."""
+        frames = matrix_game.generate_frames(image, num_frames=num_frames)
+        video_path = matrix_game.frames_to_video(frames, fps=15)
+        return video_path
+    with gr.Blocks() as demo:
+        gr.Markdown(
+            """
+            # Matrix‑Game 2.0 Demo
+            Upload an image and choose how many frames to generate.  The model
+            will synthesize a short video that extends the scene in real time.
+            Note that generation may take several minutes on machines without
+            high‑end GPUs.
+            """
+        )
+        with gr.Row():
+            with gr.Column():
+                image_input = gr.Image(type="pil", label="Initial Frame")
+                num_frames = gr.Slider(
+                    minimum=4,
+                    maximum=32,
+                    step=1,
+                    value=16,
+                    label="Number of Frames",
+                    info="Total frames in the generated video (including the initial frame)",
+                )
+                generate_btn = gr.Button("Generate Video")
+            with gr.Column():
+                video_output = gr.Video(label="Generated Video", interactive=False)
+        generate_btn.click(
+            fn=generate_fn,
+            inputs=[image_input, num_frames],
+            outputs=video_output,
+        )
+    demo.launch()
+if __name__ == "__main__":
+    launch_interface()

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+torch>=2.1
+diffusers>=0.33.0
+huggingface_hub>=0.20.0
+gradio>=4.0
+numpy>=1.21
+Pillow>=9.2
+moviepy>=1.0
+omegaconf>=2.3
+einops>=0.7
+safetensors>=0.3