Spaces:

wliu283
/

RealWonder

Runtime error

App Files Files Community

Wei Liu commited on Feb 26

Commit

fc36e06

0 Parent(s):

init huggingface deployment

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +10 -0
README.md +243 -0
app.py +786 -0
case_handlers/__init__.py +6 -0
case_handlers/base.py +149 -0
case_handlers/lamp.py +25 -0
case_handlers/persimmon.py +51 -0
case_handlers/santa_cloth.py +101 -0
case_handlers/tree.py +104 -0
config.py +60 -0
demo_data/.gitkeep +0 -0
demo_data/lamp/bg_points.pt +3 -0
demo_data/lamp/camera.pt +3 -0
demo_data/lamp/config.yaml +52 -0
demo_data/lamp/fg_masks/mask_00.png +3 -0
demo_data/lamp/fg_meshes/mesh_00.obj +3 -0
demo_data/lamp/fg_pcs/pc_00.pt +3 -0
demo_data/lamp/first_frame.png +3 -0
demo_data/lamp/inpainted_bg.png +3 -0
demo_data/lamp/sim_tmp/fg_mesh_00.obj +3 -0
demo_data/lamp/sim_tmp/flow_image.gif +3 -0
demo_data/lamp/sim_tmp/frames/frame_0001.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0002.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0003.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0004.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0005.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0006.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0007.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0008.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0009.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0010.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0011.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0012.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0013.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0014.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0015.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0016.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0017.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0018.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0019.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0020.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0021.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0022.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0023.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0024.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0025.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0026.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0027.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0028.png +3 -0
demo_data/lamp/sim_tmp/frames/frame_0029.png +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,10 @@

+*.npy filter=lfs diff=lfs merge=lfs -text
+*.obj filter=lfs diff=lfs merge=lfs -text
+*.gif filter=lfs diff=lfs merge=lfs -text
+*.mp4 filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text
+*.jpeg filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,243 @@

+# RealWonder Interactive Demo
+Interactive web demo for physics-guided video generation. Given a single image and a user-selected force direction, the system:
+1. Runs a real-time physics simulation (Genesis)
+2. Warps structured noise to follow the simulated motion
+3. Generates a realistic video using a causal diffusion model with SDEdit
+## Prerequisites
+- A GPU with at least 40 GB VRAM (tested on H100 80 GB / 140 GB)
+- Python 3.10
+- PyTorch 2.1 + CUDA 12.1 (pre-installed in the environment)
+- All packages listed in `requirements.txt`
+- A model checkpoint (see your team's checkpoint storage)
+- Preprocessed demo data placed in `demo_data/<case_name>/`
+## Setup
+### 1. Install dependencies
+```bash
+pip install -r requirements.txt
+```
+### 2. Install pytorch3d
+pytorch3d is not on standard PyPI. Install the wheel that matches your CUDA and PyTorch version:
+```bash
+# Option A: Build from source (slow)
+pip install "git+https://github.com/facebookresearch/pytorch3d.git"
+# Option B: Pre-built wheel (fast, recommended)
+# Find the matching wheel at https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/
+# Example for PyTorch 2.1 + CUDA 12.1 + Python 3.10:
+pip install --no-index --find-links \
+    https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu121_pyt210/ \
+    pytorch3d
+```
+### 3. Add demo data
+Place preprocessed demo data under `demo_data/`. Each case is a subdirectory:
+```
+demo_data/
+  lamp/
+    config.yaml          # case config (num_output_frames, denoising_step_list, ...)
+    first_frame.png      # 480x832 first frame image
+    fg_meshes/
+      mesh_00.obj        # foreground object mesh(es)
+    fg_pcs/
+      pc_00.pt           # foreground point cloud(s) (PyTorch tensors)
+    bg_points.pt         # background point cloud
+    camera.pt            # camera intrinsics K, extrinsics R/T, focal_length
+    fg_masks/
+      mask_00.png        # foreground object mask(s) (optional, for UI)
+```
+**Supported cases:** `lamp`, `persimmon`, `santa_cloth`, `tree`
+The `config.yaml` for each case must contain at minimum:
+```yaml
+example_name: "lamp"           # must match a registered case name
+material_type: ["rigid"]       # physics material(s)
+num_output_frames: 21          # number of latent frames to generate (must be divisible by 3)
+denoising_step_list: [800, 600, 400, 200]
+vgen_prompt: "A lamp swinging."
+dt: 0.02
+substeps: 10
+frame_steps: 1
+alpha_threshold: 0.5
+```
+## Running the Demo
+```bash
+cd huggingface/
+python app.py \
+    --demo_data demo_data/lamp \
+    --checkpoint_path /path/to/checkpoint.pt \
+    --port 5000 \
+    --no_debug \
+    --no_gpu_log
+```
+Open `http://localhost:5000` in a browser. Choose a force direction, optionally edit the text prompt, then click **Start**.
+### CLI Arguments
+| Argument | Default | Description |
+|---|---|---|
+| `--demo_data` | *(required)* | Path to demo data directory, e.g. `demo_data/lamp` |
+| `--checkpoint_path` | *(required)* | Path to model `.pt` checkpoint |
+| `--host` | `0.0.0.0` | Server bind address |
+| `--port` | `5000` | Server port |
+| `--use_ema` | off | Load EMA weights from checkpoint |
+| `--seed` | `42` | Random seed |
+| `--no_gpu_log` | off | Disable GPU memory logging |
+| `--no_debug` | off | Force disable debug output (overrides `config.yaml`) |
+| `--taehv` | off | Use TAEHV tiny VAE decoder (faster, slightly lower quality) |
+## Architecture
+### Startup (one-time, before first user request)
+1. Load and initialize the video generator (model + weights → GPU)
+2. Build the Genesis physics scene from the demo data meshes
+3. Pre-compute first-frame VAE + CLIP encoding, allocate KV cache, encode default prompt
+4. Warm up all CUDA kernels with dummy passes (~30s, eliminates JIT latency)
+After startup, each **Start** click only triggers lightweight per-request preparation (~0.1s text re-encoding if the prompt changed).
+### 4-Stage Streaming Pipeline
+Each generation runs a concurrent 4-stage pipeline. While the diffusion model denoises block N, noise warping processes block N+1, and simulation produces block N+2:
+```
+Stage 1a (thread)    Stage 1b (thread)    Stage 2 (thread)    Stage 3 (main)       Stage 4 (thread)
+Genesis physics  →   SVR render +     →   Noise warping   →   VAE encode +     →   Frame streaming
+(per sim step)       optical flow         (structured noise)  diffusion (SDEdit)    JPEG → browser
+                     (per pixel frame)    (per block)          (per block)          FPS-paced
+```
+All heavy GPU work (VAE encode + diffusion) runs in Stage 3 (main thread) to avoid GPU contention.
+### Key Parameters (`config.py`)
+| Parameter | Value | Description |
+|---|---|---|
+| Resolution | 480 × 832 | Pixel output size |
+| Latent size | 60 × 104 × 16 | After VAE encoding |
+| Frames per block | 3 latent / 12 pixel | Causal generation unit |
+| Default total | 21 latent / 81 pixel | 7 blocks × 3 frames |
+| Temporal factor | 4 | VAE temporal downsampling |
+| Playback FPS | 8 | Browser streaming rate |
+| Noise channels | 32 | Structured + SDE noise |
+## File Structure
+```
+huggingface/
+├── app.py                       # Flask + SocketIO web server (entry point)
+├── config.py                    # Pipeline constants
+├── simulation_engine.py         # Genesis simulation wrapper (InteractiveSimulator)
+├── noise_warper_stream.py       # Incremental noise warping (StreamingNoiseWarper)
+├── video_generator.py           # Block-by-block diffusion (StreamingVideoGenerator)
+├── gpu_profiler.py              # GPU memory logging utility
+├── taehv.py                     # Tiny AutoEncoder for fast VAE decoding (optional)
+├── requirements.txt             # pip dependencies
+│
+├── vidgen/                      # Internal video generation model library (bundled)
+├── wan/                         # Internal model modules — WanModel, VAE, tokenizers (bundled)
+│
+├── case_handlers/               # Per-case UI config and force application (web demo)
+│   ├── base.py                  # DemoCaseHandler base class + registry
+│   ├── lamp.py
+│   ├── persimmon.py
+│   ├── santa_cloth.py
+│   └── tree.py
+│
+├── simulation/
+│   ├── utils.py                 # Coordinate transforms, resize, save utilities
+│   ├── case_simulation/         # Per-case Genesis physics handlers
+│   │   ├── case_handler.py      # CaseHandler ABC + registry
+│   │   ├── lamp.py
+│   │   ├── persimmon.py
+│   │   ├── santa_cloth.py
+│   │   └── tree.py
+│   └── image23D/
+│       └── noise_warp/
+│           └── noise_warp.py    # NoiseWarper (particle-swarm noise warping)
+│
+├── templates/
+│   └── index.html               # Web UI
+├── static/
+│   ├── app.js                   # SocketIO client
+│   └── style.css
+│
+└── demo_data/                   # Preprocessed cases (add your data here)
+    └── <case_name>/
+        ├── config.yaml
+        ├── first_frame.png
+        ├── fg_meshes/
+        ├── fg_pcs/
+        ├── bg_points.pt
+        ├── camera.pt
+        └── fg_masks/
+```
+## Package Dependencies
+### Standard library
+`abc`, `argparse`, `base64`, `collections`, `glob`, `io`, `math`, `os`, `pathlib`, `queue`, `sys`, `threading`, `time`, `traceback`, `typing`, `urllib`
+### PyPI (installed via `requirements.txt`)
+| Package | PyPI name | Purpose |
+|---|---|---|
+| PyTorch | `torch` | Core ML framework |
+| TorchVision | `torchvision` | Video save, image transforms |
+| NumPy | `numpy` | Array operations |
+| Pillow | `Pillow` | Image I/O |
+| Flask | `flask` | Web server |
+| Flask-SocketIO | `flask-socketio` | Real-time frame streaming |
+| OpenCV | `opencv-python` | Flow resize, HSV colormap |
+| Einops | `einops` | Tensor reshaping |
+| OmegaConf | `omegaconf` | Config loading |
+| PEFT | `peft` | LoRA / parameter-efficient fine-tuning |
+| Safetensors | `safetensors` | Checkpoint loading |
+| Diffusers | `diffusers` | Scheduler utilities |
+| Transformers | `transformers` | CLIP text encoder, tokenizer |
+| ftfy | `ftfy` | Text normalization for CLIP |
+| EasyDict | `easydict` | Attribute-access dicts |
+| SciPy | `scipy` | Rotation utilities |
+| ImageIO | `imageio` | GIF saving |
+| Trimesh | `trimesh` | Mesh loading/export |
+| Matplotlib | `matplotlib` | Optical flow debug viz |
+| tqdm | `tqdm` | TAEHV progress bars |
+| PyYAML | `PyYAML` | Case config parsing |
+| rp | `rp` | Noise warp image utilities |
+| Genesis | `genesis-world` | Physics simulation |
+### Manual installs
+| Package | Notes |
+|---|---|
+| `pytorch3d` | Requires wheel matching CUDA/PyTorch version. See [install guide](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md). |
+| `gstaichi` | Bundled with `genesis-world`. |
+## Debug Mode
+Set `debug: true` in `demo_data/<case>/config.yaml` to save intermediate outputs to `demo_data/<case>/sim_tmp/`:
+- `gs_frames/` — Genesis camera renders (per sim step)
+- `frames/` — SVR point-cloud renders (per pixel frame)
+- `masks/` — Foreground and mesh masks
+- `optical_flow/` — Optical flow HSV visualizations
+- `noises.npy` / `noise_video.mp4` — Warped noise (latent resolution)
+Pass `--no_debug` on the command line to force-disable all debug saves regardless of `config.yaml`.

app.py ADDED Viewed

	@@ -0,0 +1,786 @@

+"""Flask + SocketIO server for the RealWonder interactive demo.
+Usage:
+    python app.py \
+        --demo_data demo_data/lamp \
+        --checkpoint_path /path/to/model.pt \
+        --port 5000
+The specified --demo_data case is fully initialized at startup (Genesis scene,
+video generator, noise warper). When a client connects, the UI shows the scene
+preview and lets the user choose force direction, edit prompt, and click Start.
+"""
+import os
+os.environ['SETUPTOOLS_USE_DISTUTILS'] = 'stdlib'
+import argparse
+import base64
+import io
+import threading
+from pathlib import Path
+from queue import Queue, Full as QueueFull, Empty as QueueEmpty
+import numpy as np
+import torch
+import torch.nn.functional as F
+from PIL import Image
+from flask import Flask, render_template
+from flask_socketio import SocketIO, emit
+from config import (
+    FRAMES_PER_BLOCK, FRAMES_PER_BLOCK_PIXEL, FRAMES_FIRST_BLOCK_PIXEL,
+    FPS, LATENT_H, LATENT_W, LATENT_C,
+    DEFAULT_HEIGHT, DEFAULT_WIDTH, TEMPORAL_FACTOR,
+    load_case_sdedit_config,
+)
+from simulation_engine import InteractiveSimulator
+from noise_warper_stream import StreamingNoiseWarper
+from video_generator import StreamingVideoGenerator
+from case_handlers.base import get_demo_case_handler
+import case_handlers  # trigger registration
+from gpu_profiler import log_gpu, set_gpu_logging
+from simulation.utils import resize_and_crop_pil
+app = Flask(__name__)
+app.config["SECRET_KEY"] = "realwonder-demo"
+socketio = SocketIO(app, cors_allowed_origins="*", async_mode="threading")
+# Global state — all initialized at startup before the server accepts connections
+simulator = None
+noise_warper = None
+generator = None
+demo_case_handler = None  # Per-case UI/force handler
+preview_b64 = None       # Base64 scene preview rendered once at startup
+default_prompt = ""       # Prompt from case config
+case_name = ""            # Name of the loaded case
+num_blocks = None         # Computed from case config at startup
+is_generating = False
+stop_requested = False
+@app.route("/")
+def index():
+    return render_template("index.html")
+@socketio.on("connect")
+def on_connect():
+    """When a client connects, send the pre-rendered scene preview and config."""
+    print("Client connected")
+    if simulator is not None and preview_b64 is not None:
+        ui_config = demo_case_handler.get_ui_config() if demo_case_handler else {}
+        ui_config["allow_change_force"] = simulator.config.get("allow_change_force", False)
+        emit("ready", {
+            "case_name": case_name,
+            "preview": preview_b64,
+            "prompt": default_prompt,
+            "ui_config": ui_config,
+        })
+    else:
+        emit("error", {"message": "Server not fully initialized. Check startup logs."})
+@socketio.on("start_generation")
+def on_start_generation(data):
+    """User chose direction + prompt and clicked Start."""
+    global is_generating, stop_requested
+    if simulator is None:
+        emit("error", {"message": "Simulator not initialized"})
+        return
+    if generator is None or not generator.is_setup:
+        emit("error", {"message": "Video generator not initialized"})
+        return
+    if is_generating:
+        emit("error", {"message": "Generation already in progress"})
+        return
+    prompt = data.get("prompt", default_prompt or "A video of physical simulation")
+    ui_forces = data.get("forces", [])
+    # Convert UI direction strings to 3D vectors and store on handler
+    force_configs = demo_case_handler.get_force_config_from_ui(ui_forces)
+    demo_case_handler.set_forces(force_configs)
+    # Configure simulation state from the main thread (required for cases
+    # like santa_cloth where taichi field writes need the creating thread's
+    # CUDA context).
+    demo_case_handler.configure_simulation(simulator)
+    emit("status", {"message": "Forces configured. Starting generation..."})
+    stop_requested = False
+    socketio.start_background_task(generation_loop, prompt)
+@socketio.on("stop_generation")
+def on_stop_generation():
+    global stop_requested
+    stop_requested = True
+@socketio.on("update_forces")
+def on_update_forces(data):
+    """User changed force direction/strength mid-generation.
+    Updates the demo handler's wind parameters (plain Python attrs).
+    The simulation thread's apply_forces() reads these every step,
+    so changes take effect immediately — no CUDA or taichi involved.
+    Only works when allow_change_force is enabled in the case config.
+    """
+    if demo_case_handler is None or simulator is None:
+        return
+    if not simulator.config.get("allow_change_force", False):
+        return
+    ui_forces = data.get("forces", [])
+    force_configs = demo_case_handler.get_force_config_from_ui(ui_forces)
+    demo_case_handler.set_forces(force_configs)
+    # Update derived wind params (direction vector, strength scalar)
+    demo_case_handler.configure_simulation(simulator)
+@socketio.on("reset")
+def on_reset():
+    global is_generating, stop_requested
+    stop_requested = True
+    if simulator is not None:
+        simulator.reset()
+    if noise_warper is not None:
+        noise_warper.reset()
+    if generator is not None:
+        generator.reset()
+    is_generating = False
+    socketio.emit("status", {"message": "Reset complete"})
+    # Re-send the preview so user can start again
+    if preview_b64 is not None:
+        ui_config = demo_case_handler.get_ui_config() if demo_case_handler else {}
+        ui_config["allow_change_force"] = simulator.config.get("allow_change_force", False) if simulator else False
+        socketio.emit("ready", {
+            "case_name": case_name,
+            "preview": preview_b64,
+            "prompt": default_prompt,
+            "ui_config": ui_config,
+        })
+def generation_loop(prompt):
+    """Main generation loop with 3-stage streaming pipeline.
+    Stage 1 (thread): Simulation — produces RGB frames + optical flows per block
+    Stage 2 (thread): Noise warping — warps noise using optical flow (lightweight)
+    Stage 3 (main):   VAE encoding + mask building + diffusion denoising + streaming
+    Each stage runs concurrently: while VGen denoises block N, noise warping
+    handles block N+1, and simulation produces block N+2. All heavy GPU work
+    (VAE encode + diffusion) is consolidated in Stage 3 to avoid GPU memory
+    contention.
+    """
+    global is_generating, stop_requested
+    is_generating = True
+    torch.set_grad_enabled(False)  # thread-local: must set in this thread too
+    try:
+        socketio.emit("status", {"message": "Preparing video generator..."})
+        # Reset noise warper before sim threads start.
+        noise_warper.reset()
+        frame_steps = simulator.frame_steps
+        # --- 4-Stage Pipeline Queues ---
+        physics_queue = Queue(maxsize=2)  # Stage 1a → Stage 1b (per pixel frame)
+        sim_queue = Queue(maxsize=2)      # Stage 1b → Stage 2  (per block)
+        ready_queue = Queue(maxsize=3)    # Stage 2  → Stage 3
+        is_debug = simulator.config.get("debug", False)
+        all_sim_frames = [] if is_debug else None
+        # --- Stage 1a: Physics producer ---
+        # Runs Genesis physics steps and puts per-frame point clouds into
+        # physics_queue.  Does NOT touch the SVR renderer, so it can run
+        # ahead of Stage 1b by up to physics_queue.maxsize frames.
+        def physics_producer():
+            import time
+            try:
+                for block_idx in range(num_blocks):
+                    if stop_requested:
+                        break
+                    n_pixel = FRAMES_FIRST_BLOCK_PIXEL if block_idx == 0 else FRAMES_PER_BLOCK_PIXEL
+                    for pf_idx in range(n_pixel):
+                        if stop_requested:
+                            break
+                        t0 = time.perf_counter()
+                        last_i = frame_steps - 1
+                        for i in range(frame_steps):
+                            updated_points = simulator.step(extract_points=(i == last_i))
+                        t_step = time.perf_counter() - t0
+                        # Capture frame_id here: render thread may be behind
+                        frame_id = simulator.step_count
+                        item = (block_idx, n_pixel, pf_idx,
+                                updated_points, frame_id, t_step)
+                        # Timed put so stop_requested is checked if render stops consuming
+                        while not stop_requested:
+                            try:
+                                physics_queue.put(item, timeout=0.5)
+                                break
+                            except QueueFull:
+                                pass
+            except Exception as e:
+                import traceback
+                traceback.print_exc()
+            finally:
+                # Best-effort sentinel — render exits via stop_requested if queue stays full
+                for _ in range(20):  # up to 10 s
+                    try:
+                        physics_queue.put(None, timeout=0.5)
+                        break
+                    except QueueFull:
+                        pass
+        # --- Stage 1b: Render + flow producer ---
+        # Reads point clouds from physics_queue, runs SVR render + optical
+        # flow + resize, accumulates per-block results, then forwards complete
+        # blocks to sim_queue (same interface as the old simulation_producer).
+        def render_flow_producer():
+            import time
+            try:
+                current_block = -1
+                flows, sim_frames, fg_masks, mesh_masks = [], [], [], []
+                t_block_start = time.perf_counter()
+                t_step_total = t_render_total = t_resize_total = 0.0
+                while not stop_requested:
+                    try:
+                        item = physics_queue.get(timeout=0.5)
+                    except QueueEmpty:
+                        continue
+                    if item is None:
+                        break
+                    block_idx, n_pixel, pf_idx, updated_points, frame_id, t_step = item
+                    if block_idx != current_block:
+                        current_block = block_idx
+                        flows, sim_frames, fg_masks, mesh_masks = [], [], [], []
+                        t_block_start = time.perf_counter()
+                        t_step_total = t_render_total = t_resize_total = 0.0
+                    t0 = time.perf_counter()
+                    frame_pil, flow_2hw, fg_mask, mesh_mask = (
+                        simulator.render_and_flow(updated_points, frame_id=frame_id)
+                    )
+                    t1 = time.perf_counter()
+                    frame_pil = resize_and_crop_pil(frame_pil, start_y=simulator.crop_start)
+                    t2 = time.perf_counter()
+                    sim_frames.append(frame_pil)
+                    flows.append(flow_2hw)
+                    fg_masks.append(fg_mask)
+                    mesh_masks.append(mesh_mask)
+                    t_step_total   += t_step
+                    t_render_total += t1 - t0
+                    t_resize_total += t2 - t1
+                    if len(sim_frames) == n_pixel:
+                        t_queue_start = time.perf_counter()
+                        if all_sim_frames is not None:
+                            all_sim_frames.extend(sim_frames)
+                        sim_queue.put((block_idx, flows, sim_frames, fg_masks, mesh_masks))
+                        t_queue_end = time.perf_counter()
+                        print(f"[TIMING] sim block {block_idx}: "
+                              f"physics step = {t_step_total:.3f}s, "
+                              f"render+flow = {t_render_total:.3f}s, "
+                              f"resize = {t_resize_total:.3f}s, "
+                              f"queue put = {t_queue_end - t_queue_start:.3f}s, "
+                              f"total = {t_queue_end - t_block_start:.3f}s "
+                              f"({n_pixel} frames)")
+            except Exception as e:
+                import traceback
+                traceback.print_exc()
+            finally:
+                sim_queue.put(None)  # Sentinel
+        # --- Stage 2: Noise Warping (lightweight, mostly CPU) ---
+        def noise_warp_stage():
+            import time
+            try:
+                while not stop_requested:
+                    t_wait_start = time.perf_counter()
+                    item = sim_queue.get()
+                    t_wait_end = time.perf_counter()
+                    if item is None:
+                        break
+                    block_idx, flows, sim_frames, fg_masks, mesh_masks = item
+                    # Warp noise incrementally using optical flow
+                    t0 = time.perf_counter()
+                    for flow in flows:
+                        noise_warper.warp_step(flow)
+                    t1 = time.perf_counter()
+                    structured_noise, sde_noise = noise_warper.get_block_noise(block_idx)
+                    t2 = time.perf_counter()
+                    ready_queue.put((
+                        block_idx,
+                        structured_noise,
+                        sde_noise,
+                        sim_frames, fg_masks, mesh_masks,
+                    ))
+                    t3 = time.perf_counter()
+                    print(f"[TIMING] warp block {block_idx}: "
+                          f"queue wait = {t_wait_end - t_wait_start:.3f}s, "
+                          f"warp steps = {t1 - t0:.3f}s, "
+                          f"get_block_noise = {t2 - t1:.3f}s, "
+                          f"queue put = {t3 - t2:.3f}s, "
+                          f"total = {t3 - t_wait_end:.3f}s")
+            except Exception as e:
+                import traceback
+                traceback.print_exc()
+            finally:
+                ready_queue.put(None)  # Sentinel
+        # Start stages 1a, 1b, and 2 BEFORE prepare_generation so the
+        # simulation pipeline (physics → render → warp) runs in parallel
+        # with text encoding.  By the time prepare_generation() returns,
+        # ready_queue may already contain block 0, eliminating the startup gap.
+        physics_thread = threading.Thread(target=physics_producer, daemon=True)
+        render_thread = threading.Thread(target=render_flow_producer, daemon=True)
+        warp_thread = threading.Thread(target=noise_warp_stage, daemon=True)
+        physics_thread.start()
+        render_thread.start()
+        warp_thread.start()
+        # Text encoding (+ conditional dict setup) runs while sim pipeline
+        # is already producing frames.
+        generator.prepare_generation(prompt)
+        # --- Stage 3: VAE Encode + Mask Build + Diffusion ---
+        # --- Stage 4: Frame streaming (separate thread, runs concurrently) ---
+        import time
+        stream_queue = Queue(maxsize=2)  # Stage 3 → Stage 4
+        def frame_streamer():
+            """Stream frames to browser at FPS rate, decoupled from GPU work."""
+            try:
+                while not stop_requested:
+                    item = stream_queue.get()
+                    if item is None:
+                        break
+                    pixel_frames, blk_idx = item
+                    for frame in pixel_frames:
+                        if stop_requested:
+                            break
+                        b64 = base64.b64encode(_encode_jpeg(frame)).decode("ascii")
+                        socketio.emit("frame", {"data": b64, "block": blk_idx})
+                        socketio.sleep(1.0 / FPS)
+            except Exception as e:
+                import traceback
+                traceback.print_exc()
+        stream_thread = threading.Thread(target=frame_streamer, daemon=True)
+        stream_thread.start()
+        t_block_end = time.perf_counter()
+        while not stop_requested:
+            t_wait_start = time.perf_counter()
+            item = ready_queue.get()
+            t_wait_end = time.perf_counter()
+            if item is None:
+                break
+            (block_idx, structured_noise, sde_noise,
+             sim_frames, fg_masks, mesh_masks) = item
+            print(f"[TIMING] block {block_idx}: queue wait = {t_wait_end - t_wait_start:.3f}s, "
+                  f"gap since prev block end = {t_wait_end - t_block_end:.3f}s")
+            socketio.emit("status", {
+                "message": f"Block {block_idx + 1}/{num_blocks} — Generating...",
+                "block": block_idx,
+                "total_blocks": num_blocks,
+            })
+            # 1. Encode simulation frames to latent (GPU)
+            t0 = time.perf_counter()
+            log_gpu(f"stage3 block {block_idx}: before VAE encode")
+            sim_frames_tensor = _frames_to_tensor(sim_frames)
+            sim_latent = generator.pipeline.encode_vae.cached_encode_to_latent(
+                sim_frames_tensor.to(device=generator.device, dtype=torch.bfloat16),
+                is_first=(block_idx == 0),
+            )
+            if sim_latent.shape[1] > FRAMES_PER_BLOCK:
+                sim_latent = sim_latent[:, :FRAMES_PER_BLOCK]
+            elif sim_latent.shape[1] < FRAMES_PER_BLOCK:
+                pad = FRAMES_PER_BLOCK - sim_latent.shape[1]
+                sim_latent = torch.cat(
+                    [sim_latent, sim_latent[:, -1:].repeat(1, pad, 1, 1, 1)], dim=1,
+                )
+            t1 = time.perf_counter()
+            log_gpu(f"stage3 block {block_idx}: after VAE encode")
+            # 2. Build masks
+            sim_mask = _downsample_masks(fg_masks, FRAMES_PER_BLOCK, crop_start=simulator.crop_start, device=generator.device)
+            sim_franka_mask = _downsample_masks(mesh_masks, FRAMES_PER_BLOCK, crop_start=simulator.crop_start, device=generator.device)
+            t2 = time.perf_counter()
+            log_gpu(f"stage3 block {block_idx}: after mask build")
+            # 3. Diffusion denoising
+            pixel_frames = generator.generate_block(
+                block_idx=block_idx,
+                structured_noise=structured_noise,
+                sim_latent=sim_latent,
+                sde_noise=sde_noise,
+                sim_mask=sim_mask,
+                sim_franka_mask=sim_franka_mask,
+            )
+            t3 = time.perf_counter()
+            # Hand off frames to streaming thread (non-blocking)
+            stream_queue.put((pixel_frames, block_idx))
+            print(f"[TIMING] block {block_idx}: VAE encode = {t1 - t0:.3f}s, "
+                  f"mask build = {t2 - t1:.3f}s, diffusion = {t3 - t2:.3f}s, "
+                  f"total = {t3 - t_wait_end:.3f}s")
+            t_block_end = t3
+        stream_queue.put(None)  # Sentinel
+        physics_thread.join(timeout=10)
+        render_thread.join(timeout=10)
+        warp_thread.join(timeout=10)
+        stream_thread.join(timeout=30)
+        # Save debug outputs only if debug mode is on
+        if simulator.config.get("debug", False):
+            if noise_warper.noise_buffer:
+                noise_stack = torch.stack(noise_warper.noise_buffer, dim=0)  # (T, C, H, W)
+                downscale_factor = DEFAULT_HEIGHT // LATENT_H  # 480 // 60 = 8
+                noise_latent = F.interpolate(
+                    noise_stack, size=(LATENT_H, LATENT_W), mode="area",
+                ) * downscale_factor  # (T, 32, 60, 104)
+                numpy_noises = noise_latent.cpu().permute(0, 2, 3, 1).numpy().astype(np.float16)  # (T, H, W, C)
+                debug_dir = Path(simulator.config.get("output_folder", "/tmp/demo_debug"))
+                debug_dir.mkdir(parents=True, exist_ok=True)
+                noises_path = debug_dir / "noises.npy"
+                np.save(noises_path, numpy_noises)
+                noise_vis = np.clip(numpy_noises[:, :, :, :3].astype(np.float32) / 4 + 0.5, 0, 1)
+                noise_vis = (noise_vis * 255).astype(np.uint8)
+                noise_video_tensor = torch.from_numpy(noise_vis)  # (T, H, W, 3) uint8
+                from torchvision.io import write_video
+                noise_mp4_path = str(debug_dir / "noise_video.mp4")
+                write_video(noise_mp4_path, noise_video_tensor, fps=30, video_codec="libx264")
+                print(f"Noise saved to: {noises_path}  video: {noise_mp4_path}")
+            simulator.save_debug_outputs(sim_frames=all_sim_frames)
+        socketio.emit("generation_complete", {})
+        socketio.emit("status", {"message": "Generation complete"})
+    except Exception as e:
+        socketio.emit("error", {"message": f"Generation error: {str(e)}"})
+        import traceback
+        traceback.print_exc()
+    finally:
+        is_generating = False
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+def _find_first_frame():
+    """Locate the first-frame image for video generation."""
+    case_path = simulator.demo_data_path
+    candidate = case_path / "first_frame.png"
+    if candidate.exists():
+        return str(candidate)
+    input_path = Path(simulator.config.get("data_path", "")) / "input.png"
+    if input_path.exists():
+        return str(input_path)
+    return str(candidate)  # fallback, may error later with clear message
+def _frames_to_tensor(frames_pil):
+    """Convert list of PIL frames (already 480x832) to tensor [1, C, T, H, W] in [-1, 1]."""
+    arrays = []
+    for f in frames_pil:
+        arr = np.array(f.convert("RGB"))
+        arr = arr.astype(np.float32) / 127.5 - 1.0
+        arrays.append(torch.from_numpy(arr))
+    tensor = torch.stack(arrays, dim=0).permute(3, 0, 1, 2).contiguous()
+    return tensor.unsqueeze(0)
+def _downsample_masks(masks, target_frames, crop_start=176, device="cuda"):
+    """Downsample list of mask tensors to target_frames latent frames."""
+    if not masks or all(m is None for m in masks):
+        return None
+    processed = []
+    for m in masks:
+        if m is None:
+            processed.append(torch.zeros(1, 1, LATENT_H, LATENT_W, device=device))
+            continue
+        if isinstance(m, torch.Tensor):
+            m = m.to(device=device)
+            if m.dim() == 3:
+                m = m.squeeze(-1)
+            m_832 = F.interpolate(
+                m.float().unsqueeze(0).unsqueeze(0),
+                size=(832, 832), mode="bilinear", align_corners=False,
+            )
+            m_cropped = m_832[:, :, crop_start:crop_start + DEFAULT_HEIGHT, :]
+            m_latent = F.interpolate(
+                m_cropped, size=(LATENT_H, LATENT_W),
+                mode="bilinear", align_corners=False,
+            )
+            processed.append(m_latent)
+        else:
+            processed.append(torch.zeros(1, 1, LATENT_H, LATENT_W, device=device))
+    stacked = torch.cat(processed, dim=0)
+    T = stacked.shape[0]
+    time_averaged = []
+    for i in range(0, T, TEMPORAL_FACTOR):
+        group = stacked[i:i + TEMPORAL_FACTOR]
+        time_averaged.append(group.mean(dim=0, keepdim=True))
+    stacked = torch.cat(time_averaged, dim=0)
+    if stacked.shape[0] > target_frames:
+        stacked = stacked[:target_frames]
+    elif stacked.shape[0] < target_frames:
+        pad = target_frames - stacked.shape[0]
+        stacked = torch.cat(
+            [stacked, stacked[-1:].repeat(pad, 1, 1, 1)], dim=0,
+        )
+    result = stacked.squeeze(1).unsqueeze(0)
+    return (result > 0.5).bool()
+def _encode_jpeg(frame_np, quality=85):
+    img = Image.fromarray(frame_np)
+    buf = io.BytesIO()
+    img.save(buf, format="JPEG", quality=quality)
+    return buf.getvalue()
+def _encode_pil_b64(pil_img, fmt="JPEG", quality=85):
+    buf = io.BytesIO()
+    pil_img.save(buf, format=fmt, quality=quality)
+    return base64.b64encode(buf.getvalue()).decode("ascii")
+# ---------------------------------------------------------------------------
+# Pipeline warmup — compile CUDA kernels before first user request
+# ---------------------------------------------------------------------------
+def _warmup_pipeline():
+    """Run dummy passes through each pipeline stage to trigger CUDA JIT.
+    Without this, the first user-facing generation pays ~24s of kernel
+    compilation across simulation render, noise warping, and diffusion.
+    """
+    import time
+    print("[4/5] Warming up CUDA kernels (one-time cost)...")
+    torch.set_grad_enabled(False)
+    # 1. Warm up simulation render + optical flow
+    t0 = time.perf_counter()
+    for _pass in range(2):
+        for _ in range(simulator.frame_steps):
+            updated_points = simulator.step()
+        simulator.render_and_flow(updated_points)
+    # Reset simulation state (scene.reset restores to built state)
+    simulator.scene.reset()
+    simulator.case_handler.fix_particles()  # re-pin after reset
+    simulator.step_count = 0
+    simulator.svr.previous_frame_data = None
+    simulator.svr.optical_flow = np.array([])
+    simulator.svr._last_optical_flow = None
+    simulator.svr._prev_fg_frags_idx = None
+    simulator.svr._prev_fg_frags_dists = None
+    # Keep cache_bg — background render is reusable
+    t1 = time.perf_counter()
+    print(f"      Sim + render warmup: {t1 - t0:.1f}s")
+    # 2. Warm up noise warper (grid_sample, meshgrid, interpolate kernels)
+    dummy_flow = np.zeros((2, 512, 512), dtype=np.float32)
+    noise_warper.warp_step(dummy_flow)
+    noise_warper.reset()
+    t2 = time.perf_counter()
+    print(f"      Noise warp warmup:   {t2 - t1:.1f}s")
+    # 3. Warm up VAE encode + diffusion (transformer attention kernels)
+    generator.prepare_generation(default_prompt)
+    # Dummy VAE encode
+    dummy_pixel = torch.zeros(
+        1, 3, FRAMES_FIRST_BLOCK_PIXEL, DEFAULT_HEIGHT, DEFAULT_WIDTH,
+        device=generator.device, dtype=torch.bfloat16,
+    )
+    sim_latent = generator.pipeline.encode_vae.cached_encode_to_latent(
+        dummy_pixel, is_first=True,
+    )
+    if sim_latent.shape[1] > FRAMES_PER_BLOCK:
+        sim_latent = sim_latent[:, :FRAMES_PER_BLOCK]
+    elif sim_latent.shape[1] < FRAMES_PER_BLOCK:
+        pad = FRAMES_PER_BLOCK - sim_latent.shape[1]
+        sim_latent = torch.cat(
+            [sim_latent, sim_latent[:, -1:].repeat(1, pad, 1, 1, 1)], dim=1,
+        )
+    # Dummy diffusion block
+    dummy_noise = torch.randn(
+        1, FRAMES_PER_BLOCK, LATENT_C, LATENT_H, LATENT_W,
+        device=generator.device, dtype=torch.bfloat16,
+    )
+    generator.generate_block(
+        block_idx=0,
+        structured_noise=dummy_noise,
+        sim_latent=sim_latent,
+    )
+    # Run two more dummy blocks to warm up the KV-cache-populated code
+    # paths (blocks 1+ are structurally different from block 0 because the
+    # self-attention KV cache is non-empty).  Without this, real generation
+    # blocks 0 and 1 hit slow cuDNN algorithm selection on first use, taking
+    # ~4s each instead of ~1s.  The crossattn_cache stays valid across these
+    # extra blocks (same prompt), so they run fast (~1s each).
+    for _blk in range(1, 3):
+        _dummy_latent = torch.zeros(
+            1, FRAMES_PER_BLOCK, LATENT_C, LATENT_H, LATENT_W,
+            device=generator.device, dtype=torch.bfloat16,
+        )
+        _dummy_noise = torch.randn_like(_dummy_latent)
+        generator.generate_block(
+            block_idx=_blk,
+            structured_noise=_dummy_noise,
+            sim_latent=_dummy_latent,
+        )
+    # Reset generator state (KV self-attention cache + VAE caches).
+    # crossattn_cache is intentionally preserved: it is text-conditioned
+    # and stays valid for the default prompt, so real generation blocks 0
+    # and 1 skip the expensive cold re-initialization.
+    generator.reset()
+    generator.pipeline.encode_vae.model.clear_cache()
+    t3 = time.perf_counter()
+    print(f"      VAE + diffusion warmup: {t3 - t2:.1f}s")
+    print(f"      Total warmup: {t3 - t0:.1f}s — first generation will be fast.")
+    log_gpu("after pipeline warmup")
+# ---------------------------------------------------------------------------
+# Startup
+# ---------------------------------------------------------------------------
+def main():
+    global simulator, noise_warper, generator, demo_case_handler
+    global preview_b64, default_prompt, case_name, num_blocks
+    parser = argparse.ArgumentParser(description="RealWonder Interactive Demo")
+    parser.add_argument("--demo_data", type=str, required=True,
+                        help="Path to demo data directory (e.g. demo_data/lamp)")
+    parser.add_argument("--checkpoint_path", type=str, required=True,
+                        help="Path to video generation model checkpoint")
+    parser.add_argument("--host", type=str, default="0.0.0.0")
+    parser.add_argument("--port", type=int, default=5000)
+    parser.add_argument("--use_ema", action="store_true")
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--no_gpu_log", action="store_true",
+                        help="Disable GPU memory logging")
+    parser.add_argument("--no_debug", action="store_true",
+                        help="Force disable debug outputs (overrides config.yaml)")
+    parser.add_argument("--taehv", action="store_true",
+                        help="Use TAEHV tiny VAE decoder (faster but slightly lower quality)")
+    args = parser.parse_args()
+    if args.no_gpu_log:
+        set_gpu_logging(False)
+    demo_data_path = Path(args.demo_data)
+    case_name = demo_data_path.name
+    if not demo_data_path.exists() or not (demo_data_path / "config.yaml").exists():
+        print(f"ERROR: {demo_data_path} does not exist or has no config.yaml")
+        return
+    # ---- Load case config and derive SDEdit parameters ----
+    import yaml
+    with open(demo_data_path / "config.yaml") as f:
+        case_config = yaml.safe_load(f)
+    sdedit_cfg = load_case_sdedit_config(case_config)
+    num_blocks = sdedit_cfg["num_blocks"]
+    print(f"Case SDEdit config: {sdedit_cfg}")
+    # ---- Step 1: Initialize video generator ----
+    print(f"[1/5] Initializing video generator from {args.checkpoint_path} ...")
+    log_gpu("before video generator init")
+    generator = StreamingVideoGenerator(
+        checkpoint_path=args.checkpoint_path,
+        num_pixel_frames=sdedit_cfg["num_pixel_frames"],
+        denoising_steps=sdedit_cfg["denoising_step_list"],
+        mask_dropin_step=sdedit_cfg["mask_dropin_step"],
+        franka_step=sdedit_cfg["franka_step"],
+        use_ema=args.use_ema,
+        seed=args.seed,
+        enable_taehv=args.taehv,
+    )
+    generator.setup()
+    log_gpu("after video generator setup")
+    print("      Video generator ready.")
+    # ---- Step 2: Initialize simulator (Genesis scene) ----
+    print(f"[2/5] Loading case '{case_name}' and building Genesis scene ...")
+    log_gpu("before simulator init")
+    # Per-case config overrides (e.g. disable built-in force fields for
+    # cases where the demo handler applies forces interactively).
+    config_overrides = {}
+    if case_name == "santa_cloth":
+        config_overrides["skip_force_fields"] = True
+    simulator = InteractiveSimulator(
+        str(demo_data_path), config_overrides=config_overrides,
+    )
+    if args.no_debug:
+        simulator.config["debug"] = False
+    log_gpu("after simulator init")
+    # Create per-case demo handler and attach to simulator
+    demo_case_handler = get_demo_case_handler(case_name, simulator.config)
+    demo_case_handler.set_object_masks(simulator.object_masks_b64)
+    simulator.set_demo_case_handler(demo_case_handler)
+    print(f"      Demo case handler: {type(demo_case_handler).__name__}")
+    noise_warper = StreamingNoiseWarper(crop_start=simulator.crop_start)
+    log_gpu("after noise warper init")
+    print("      Simulator and noise warper ready.")
+    # ---- Step 3: Pre-compute first frame encoding + KV cache + default prompt ----
+    print("[3/5] Pre-computing first frame encoding + KV cache + default prompt ...")
+    first_frame_path = _find_first_frame()
+    preview_pil = Image.open(first_frame_path).convert("RGB")
+    preview_b64 = _encode_pil_b64(preview_pil)
+    default_prompt = simulator.config.get("vgen_prompt", "A video of physical simulation")
+    generator.precompute_first_frame(first_frame_path, default_prompt=default_prompt)
+    log_gpu("after first frame pre-computation")
+    print(f"      First frame pre-computed from {first_frame_path}. All components initialized.")
+    # ---- Step 4: Warm up CUDA kernels ----
+    _warmup_pipeline()
+    # ---- Step 5: Start server ----
+    print(f"\nStarting server on {args.host}:{args.port}")
+    print(f"Open http://localhost:{args.port} in your browser.\n")
+    socketio.run(app, host=args.host, port=args.port, debug=False,
+                 allow_unsafe_werkzeug=True)
+if __name__ == "__main__":
+    main()

case_handlers/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Import case handlers to trigger registration."""
+from case_handlers.lamp import LampDemoHandler
+from case_handlers.persimmon import PersimmonDemoHandler
+from case_handlers.santa_cloth import SantaClothDemoHandler
+from case_handlers.tree import TreeDemoHandler

case_handlers/base.py ADDED Viewed

	@@ -0,0 +1,149 @@

+"""Base demo case handler with registry pattern.
+Provides a registry + decorator for per-case UI configuration and
+force application logic in the demo_web frontend.
+"""
+import numpy as np
+DEMO_CASE_REGISTRY = {}
+def register_demo_case(case_name: str):
+    """Decorator to register a DemoCaseHandler subclass."""
+    def decorator(cls):
+        if case_name in DEMO_CASE_REGISTRY:
+            raise ValueError(f"Demo case '{case_name}' already registered!")
+        DEMO_CASE_REGISTRY[case_name] = cls
+        return cls
+    return decorator
+class DemoCaseHandler:
+    """Base class for per-case UI config and force application in demo_web.
+    Subclasses override ``get_ui_config`` and optionally ``apply_forces``
+    to customise behaviour for specific cases.
+    """
+    # Per-object physics force multiplier applied on top of the UI strength
+    # slider.  Subclasses override this so the UI always shows a normalised
+    # 0-5 range while the actual force magnitude is case-appropriate.
+    # Either a single float (applied to all objects) or a list of floats
+    # (one per object).
+    force_scale = 1.0
+    def __init__(self, config):
+        self.config = config
+        self._forces = []  # list of {"obj_idx", "direction", "strength"}
+        self._object_masks_b64 = []  # per-object mask images as base64 PNGs
+    @property
+    def num_objects(self):
+        return len(self.config.get("material_type", []))
+    def set_object_masks(self, masks_b64_list):
+        """Store base64-encoded mask PNGs for each object."""
+        self._object_masks_b64 = list(masks_b64_list) if masks_b64_list else []
+    # -- UI configuration --------------------------------------------------
+    def get_ui_config(self):
+        """Return JSON-serialisable dict describing per-object controls.
+        Default: one control per object with left/right/none, strength 1.0.
+        Includes mask_b64 for each object if masks were set.
+        """
+        objects = []
+        for idx in range(self.num_objects):
+            obj = {
+                "idx": idx,
+                "label": f"Object {idx}",
+                "directions": ["left", "none", "right"],
+                "default_direction": "none",
+                "default_strength": 1.0,
+                "max_strength": 2.0,
+            }
+            if idx < len(self._object_masks_b64):
+                obj["mask_b64"] = self._object_masks_b64[idx]
+            objects.append(obj)
+        return {"num_objects": self.num_objects, "objects": objects}
+    # -- Force management --------------------------------------------------
+    def get_force_config_from_ui(self, ui_forces):
+        """Map UI force dicts to 3D vectors.
+        Args:
+            ui_forces: list of ``{"obj_idx", "direction", "strength"}``
+                       where direction is either a legacy string
+                       ("left"/"right"/"none") or a 3-element list [dx, dy, dz].
+        Returns:
+            list of ``{"obj_idx", "direction": [dx,dy,dz], "strength"}``.
+        """
+        legacy_map = {
+            "left": [-1.0, 0.0, 0.0],
+            "right": [1.0, 0.0, 0.0],
+            "none": [0.0, 0.0, 0.0],
+        }
+        result = []
+        for f in ui_forces:
+            d = f.get("direction", [0.0, 0.0, 0.0])
+            if isinstance(d, str):
+                vec = legacy_map.get(d, [0.0, 0.0, 0.0])
+            else:
+                vec = [float(v) for v in d]
+            result.append({
+                "obj_idx": int(f.get("obj_idx", 0)),
+                "direction": vec,
+                "strength": float(f.get("strength", 0.0)),
+            })
+        return result
+    def set_forces(self, forces):
+        """Store resolved force configs (output of ``get_force_config_from_ui``)."""
+        self._forces = list(forces)
+    def configure_simulation(self, simulator):
+        """Called from the main thread before the generation loop starts.
+        Override in subclasses that need to set simulation state requiring
+        the main thread's CUDA context (e.g. taichi field writes).
+        """
+        pass
+    def reset_forces(self):
+        self._forces = []
+    def apply_forces(self, simulator, step_count):
+        """Apply stored forces to the simulator's objects.
+        Default behaviour: apply a constant force every step to each rigid
+        object that has a non-zero direction.
+        """
+        for f in self._forces:
+            obj_idx = f["obj_idx"]
+            direction = np.array(f["direction"], dtype=np.float32)
+            strength = f["strength"]
+            norm = np.linalg.norm(direction)
+            if norm < 1e-6:
+                continue
+            direction = direction / norm
+            if isinstance(self.force_scale, (list, tuple)):
+                scale = self.force_scale[obj_idx] if obj_idx < len(self.force_scale) else 1.0
+            else:
+                scale = self.force_scale
+            force_magnitude = strength * scale
+            mt = simulator.material_type[obj_idx] if obj_idx < len(simulator.material_type) else "rigid"
+            if mt == "rigid":
+                simulator.objs[obj_idx].solver.apply_links_external_force(
+                    force=(direction * force_magnitude).reshape(1, 3),
+                    links_idx=[simulator.objs[obj_idx].idx],
+                )
+def get_demo_case_handler(case_name, config):
+    """Factory: return a handler for *case_name*, falling back to default."""
+    cls = DEMO_CASE_REGISTRY.get(case_name, DemoCaseHandler)
+    return cls(config)

case_handlers/lamp.py ADDED Viewed

	@@ -0,0 +1,25 @@

+"""Lamp demo case handler — single rigid object, constant force."""
+from case_handlers.base import DemoCaseHandler, register_demo_case
+@register_demo_case("lamp")
+class LampDemoHandler(DemoCaseHandler):
+    force_scale = 2.5
+    def get_ui_config(self):
+        objects = [
+            {
+                "idx": 0,
+                "label": "Lamp",
+                "directions": ["left", "none", "right"],
+                "default_direction": "none",
+                "default_strength": 1.0,
+                "max_strength": 2.0,
+            },
+        ]
+        for obj in objects:
+            if obj["idx"] < len(self._object_masks_b64):
+                obj["mask_b64"] = self._object_masks_b64[obj["idx"]]
+        return {"num_objects": len(objects), "objects": objects}

case_handlers/persimmon.py ADDED Viewed

	@@ -0,0 +1,51 @@

+"""Persimmon demo case handler — 3 rigid objects, force for first 5 steps only."""
+import numpy as np
+from case_handlers.base import DemoCaseHandler, register_demo_case
+@register_demo_case("persimmon")
+class PersimmonDemoHandler(DemoCaseHandler):
+    # Per-object force multiplier: top persimmon is lighter so needs less
+    # force to move the same distance.  [top, middle, bottom]
+    force_scale = [50.0, 200.0, 100.0]
+    def get_ui_config(self):
+        objects = [
+            {
+                "idx": 0,
+                "label": "Top Persimmon",
+                "directions": ["left", "none", "right"],
+                "default_direction": "none",
+                "default_strength": 1.0,
+                "max_strength": 2.0,
+            },
+            {
+                "idx": 1,
+                "label": "Middle Persimmon",
+                "directions": ["left", "none", "right"],
+                "default_direction": "none",
+                "default_strength": 1.0,
+                "max_strength": 2.0,
+            },
+            {
+                "idx": 2,
+                "label": "Bottom Persimmon",
+                "directions": ["left", "none", "right"],
+                "default_direction": "none",
+                "default_strength": 1.0,
+                "max_strength": 2.0,
+            },
+        ]
+        for obj in objects:
+            if obj["idx"] < len(self._object_masks_b64):
+                obj["mask_b64"] = self._object_masks_b64[obj["idx"]]
+        return {"num_objects": len(objects), "objects": objects}
+    def apply_forces(self, simulator, step_count):
+        """Only apply forces for the first 5 simulation steps (matching offline persimmon.py)."""
+        if step_count > 5:
+            return
+        super().apply_forces(simulator, step_count)

case_handlers/santa_cloth.py ADDED Viewed

	@@ -0,0 +1,101 @@

+"""Santa cloth demo case handler — PBD cloth with controllable wind."""
+import numpy as np
+import torch
+from case_handlers.base import DemoCaseHandler, register_demo_case
+@register_demo_case("santa_cloth")
+class SantaClothDemoHandler(DemoCaseHandler):
+    force_scale = 1.0
+    def __init__(self, config):
+        super().__init__(config)
+        self._wind_direction = np.zeros(3, dtype=np.float32)
+        self._wind_strength = 0.0
+        self._wind_bounds = None  # (z_low, z_high)
+    def get_ui_config(self):
+        objects = [
+            {
+                "idx": 0,
+                "label": "Santa's Clothes",
+                "directions": ["left", "none", "right"],
+                "default_direction": "none",
+                "default_strength": 1.0,
+                "max_strength": 2.0,
+            },
+        ]
+        for obj in objects:
+            if obj["idx"] < len(self._object_masks_b64):
+                obj["mask_b64"] = self._object_masks_b64[obj["idx"]]
+        return {"num_objects": len(objects), "objects": objects}
+    def configure_simulation(self, simulator):
+        """Pre-compute wind parameters from stored forces (any thread)."""
+        for f in self._forces:
+            direction = np.array(f["direction"], dtype=np.float32)
+            strength = f["strength"]
+            norm = np.linalg.norm(direction)
+            if norm < 1e-6:
+                self._wind_direction = np.zeros(3, dtype=np.float32)
+                self._wind_strength = 0.0
+                continue
+            self._wind_direction = direction / norm
+            self._wind_strength = strength * self.force_scale
+        if self._wind_bounds is None and len(simulator.all_obj_info) > 0:
+            info = simulator.all_obj_info[0]
+            z_min = float(info["min"][2])
+            z_max = float(info["max"][2])
+            z_range = z_max - z_min
+            self._wind_bounds = (
+                z_min + z_range * 0.05,
+                z_min + z_range * 0.8,
+            )
+    def apply_forces(self, simulator, step_count):
+        """Apply wind to PBD cloth by modifying particle velocities."""
+        if self._wind_strength < 1e-6:
+            return
+        if self._wind_bounds is None:
+            return
+        wind_lowest, wind_highest = self._wind_bounds
+        dt = simulator.dt
+        for obj_idx, obj in enumerate(simulator.objs):
+            mt = simulator.material_type[obj_idx] if obj_idx < len(simulator.material_type) else "rigid"
+            if mt not in ("pbd_cloth", "pbd_elastic", "pbd_particle"):
+                continue
+            solver = obj.solver
+            state = solver.get_state(0)
+            if state is None:
+                continue
+            p_start = obj.particle_start
+            n_p = obj.n_particles
+            z = state.pos[0, p_start:p_start + n_p, 2]
+            is_free = state.free[0, p_start:p_start + n_p].bool()
+            in_zone = (z > wind_lowest) & (z < wind_highest)
+            mask = is_free & in_zone
+            if not mask.any():
+                continue
+            t = torch.zeros_like(z)
+            t[mask] = (z[mask] - wind_lowest) / (wind_highest - wind_lowest)
+            scaler = torch.zeros_like(z)
+            scaler[mask] = torch.exp(t[mask] ** 2)
+            wind_dir = torch.tensor(
+                self._wind_direction, dtype=z.dtype, device=z.device,
+            )
+            wind_delta = wind_dir.unsqueeze(0) * (self._wind_strength * scaler.unsqueeze(1) * dt)
+            state.vel[0, p_start:p_start + n_p, :] += wind_delta
+            solver.set_state(0, state)

case_handlers/tree.py ADDED Viewed

	@@ -0,0 +1,104 @@

+"""Tree demo case handler — MPM elastic with controllable wind."""
+import numpy as np
+import torch
+from case_handlers.base import DemoCaseHandler, register_demo_case
+@register_demo_case("tree")
+class TreeDemoHandler(DemoCaseHandler):
+    force_scale = 1.0
+    def __init__(self, config):
+        super().__init__(config)
+        self._wind_direction = np.zeros(3, dtype=np.float32)
+        self._wind_strength = 0.0
+        self._wind_bounds = None  # (z_low, z_high)
+    def get_ui_config(self):
+        objects = [
+            {
+                "idx": 0,
+                "label": "Tree",
+                "directions": ["left", "none", "right"],
+                "default_direction": "none",
+                "default_strength": 1.0,
+                "max_strength": 2.0,
+            },
+        ]
+        for obj in objects:
+            if obj["idx"] < len(self._object_masks_b64):
+                obj["mask_b64"] = self._object_masks_b64[obj["idx"]]
+        return {"num_objects": len(objects), "objects": objects}
+    def configure_simulation(self, simulator):
+        """Pre-compute wind parameters from stored forces (any thread)."""
+        for f in self._forces:
+            direction = np.array(f["direction"], dtype=np.float32)
+            strength = f["strength"]
+            norm = np.linalg.norm(direction)
+            if norm < 1e-6:
+                self._wind_direction = np.zeros(3, dtype=np.float32)
+                self._wind_strength = 0.0
+                continue
+            self._wind_direction = direction / norm
+            self._wind_strength = strength * self.force_scale
+        if self._wind_bounds is None and len(simulator.all_obj_info) > 0:
+            info = simulator.all_obj_info[0]
+            z_min = float(info["min"][2])
+            z_max = float(info["max"][2])
+            z_range = z_max - z_min
+            self._wind_bounds = (
+                z_min + z_range * 0.05,
+                z_min + z_range * 0.8,
+            )
+    def apply_forces(self, simulator, step_count):
+        """Apply wind to MPM particles by modifying particle velocities."""
+        if self._wind_strength < 1e-6:
+            return
+        if self._wind_bounds is None:
+            return
+        wind_lowest, wind_highest = self._wind_bounds
+        dt = simulator.dt
+        for obj_idx, obj in enumerate(simulator.objs):
+            mt = simulator.material_type[obj_idx] if obj_idx < len(simulator.material_type) else "rigid"
+            if not mt.startswith("mpm_"):
+                continue
+            solver = obj.solver
+            state = solver.get_state(0)
+            if state is None:
+                continue
+            p_start = obj.particle_start
+            n_p = obj.n_particles
+            z = state.pos[0, p_start:p_start + n_p, 2]
+            in_zone = (z > wind_lowest) & (z < wind_highest)
+            if hasattr(state, 'free'):
+                mask = state.free[0, p_start:p_start + n_p].bool() & in_zone
+            else:
+                mask = in_zone
+            if not mask.any():
+                continue
+            t = torch.zeros_like(z)
+            t[mask] = (z[mask] - wind_lowest) / (wind_highest - wind_lowest)
+            scaler = torch.zeros_like(z)
+            scaler[mask] = torch.exp(t[mask] ** 2)
+            wind_dir = torch.tensor(
+                self._wind_direction, dtype=z.dtype, device=z.device,
+            )
+            wind_delta = wind_dir.unsqueeze(0) * (self._wind_strength * scaler.unsqueeze(1) * dt)
+            state.vel[0, p_start:p_start + n_p, :] += wind_delta
+            solver.set_state(0, state)

config.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""Default configuration constants for the RealWonder interactive demo."""
+# Video dimensions
+DEFAULT_HEIGHT = 480
+DEFAULT_WIDTH = 832
+# Latent dimensions (after VAE encoding)
+LATENT_H = 60
+LATENT_W = 104
+LATENT_C = 16
+# VAE temporal downsampling factor
+TEMPORAL_FACTOR = 4
+# Causal generation blocks (model architecture constants)
+FRAMES_PER_BLOCK = 3  # latent frames per block
+FRAMES_PER_BLOCK_PIXEL = FRAMES_PER_BLOCK * TEMPORAL_FACTOR  # pixel frames per block
+FRAMES_FIRST_BLOCK_PIXEL = (FRAMES_PER_BLOCK - 1) * TEMPORAL_FACTOR + 1  # pixel frames for first block
+# Playback
+FPS = 8
+# Simulation parameters are read from each case's config.yaml at runtime
+# (dt, substeps, frame_steps) — see InteractiveSimulator.__init__
+# Noise warping
+NOISE_CHANNELS = 32
+# SDEdit
+EVAL_DEGRADATION = 0.5
+# Model defaults
+DEFAULT_LOCAL_ATTN_SIZE = 21
+DEFAULT_TIMESTEP_SHIFT = 5.0
+CONTEXT_NOISE = 0
+def load_case_sdedit_config(case_config: dict) -> dict:
+    """Extract SDEdit parameters from a case config.yaml dict.
+    Reads num_output_frames, denoising_step_list, mask_dropin_step from the
+    case config and computes all derived frame/block counts.
+    Returns a dict with keys:
+        num_latent_frames, num_pixel_frames, num_blocks,
+        denoising_step_list, mask_dropin_step
+    """
+    num_latent_frames = case_config["num_output_frames"]
+    assert num_latent_frames % FRAMES_PER_BLOCK == 0, (
+        f"num_output_frames ({num_latent_frames}) must be divisible by "
+        f"FRAMES_PER_BLOCK ({FRAMES_PER_BLOCK})"
+    )
+    return {
+        "num_latent_frames": num_latent_frames,
+        "num_pixel_frames": (num_latent_frames - 1) * TEMPORAL_FACTOR + 1,
+        "num_blocks": num_latent_frames // FRAMES_PER_BLOCK,
+        "denoising_step_list": case_config["denoising_step_list"],
+        "mask_dropin_step": case_config.get("mask_dropin_step", -1),
+        "franka_step": case_config.get("franka_step", -1),
+    }

demo_data/.gitkeep ADDED Viewed

File without changes

demo_data/lamp/bg_points.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c1b8fd606ca468ed6f9f0a8eebc871949df4f50355cb198242d1548a5c0b245
+size 6292900

demo_data/lamp/camera.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:09017373389bc34d069d66ee6388670e04cd8f7e9c30b8a43e5adff02d062654
+size 1928

demo_data/lamp/config.yaml ADDED Viewed

	@@ -0,0 +1,52 @@

+device: cuda
+seed: 0
+example_name: lamp
+output_folder: demo_web/demo_data/lamp/recon_tmp
+data_path: cases/lamp
+segmenter: sam2
+all_object_points:
+- - - 250
+    - 207
+    - 1
+  - - 273
+    - 287
+    - 1
+all_object_masks_idx:
+- 1
+obj_kp_matching: true
+obj_kp:
+- - - 0.2
+    - 0.8
+  - - 0.1
+    - 0.9
+logging_level: details
+debug: true
+stitched_inpainting: false
+mesh_resize_factor: 1.0
+target_faces: 10000
+dt: 0.02
+substeps: 10
+simulated_frames_num: 81
+frame_steps: 1
+material_type:
+- rigid
+use_primitive: true
+remap_depth:
+- 1.0
+- 2.0
+rigid_rho: 1000
+rigid_friction: 0.01
+plane_friction: 0.01
+gravity: -1
+alpha_threshold: 0.8
+crop_start: 200
+fg_points_render_radius: 0.01
+num_output_frames: 21
+denoising_step_list:
+- 800
+- 500
+- 250
+mask_dropin_step: -1
+vgen_prompt: A square paper lantern is moving on river. Water surface ripples follow
+  the motion. Twilight, cinematic realism.
+fov_x_input: 27.449039459228516