Wei Liu commited on
Commit
fc36e06
·
0 Parent(s):

init huggingface deployment

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +10 -0
  2. README.md +243 -0
  3. app.py +786 -0
  4. case_handlers/__init__.py +6 -0
  5. case_handlers/base.py +149 -0
  6. case_handlers/lamp.py +25 -0
  7. case_handlers/persimmon.py +51 -0
  8. case_handlers/santa_cloth.py +101 -0
  9. case_handlers/tree.py +104 -0
  10. config.py +60 -0
  11. demo_data/.gitkeep +0 -0
  12. demo_data/lamp/bg_points.pt +3 -0
  13. demo_data/lamp/camera.pt +3 -0
  14. demo_data/lamp/config.yaml +52 -0
  15. demo_data/lamp/fg_masks/mask_00.png +3 -0
  16. demo_data/lamp/fg_meshes/mesh_00.obj +3 -0
  17. demo_data/lamp/fg_pcs/pc_00.pt +3 -0
  18. demo_data/lamp/first_frame.png +3 -0
  19. demo_data/lamp/inpainted_bg.png +3 -0
  20. demo_data/lamp/sim_tmp/fg_mesh_00.obj +3 -0
  21. demo_data/lamp/sim_tmp/flow_image.gif +3 -0
  22. demo_data/lamp/sim_tmp/frames/frame_0001.png +3 -0
  23. demo_data/lamp/sim_tmp/frames/frame_0002.png +3 -0
  24. demo_data/lamp/sim_tmp/frames/frame_0003.png +3 -0
  25. demo_data/lamp/sim_tmp/frames/frame_0004.png +3 -0
  26. demo_data/lamp/sim_tmp/frames/frame_0005.png +3 -0
  27. demo_data/lamp/sim_tmp/frames/frame_0006.png +3 -0
  28. demo_data/lamp/sim_tmp/frames/frame_0007.png +3 -0
  29. demo_data/lamp/sim_tmp/frames/frame_0008.png +3 -0
  30. demo_data/lamp/sim_tmp/frames/frame_0009.png +3 -0
  31. demo_data/lamp/sim_tmp/frames/frame_0010.png +3 -0
  32. demo_data/lamp/sim_tmp/frames/frame_0011.png +3 -0
  33. demo_data/lamp/sim_tmp/frames/frame_0012.png +3 -0
  34. demo_data/lamp/sim_tmp/frames/frame_0013.png +3 -0
  35. demo_data/lamp/sim_tmp/frames/frame_0014.png +3 -0
  36. demo_data/lamp/sim_tmp/frames/frame_0015.png +3 -0
  37. demo_data/lamp/sim_tmp/frames/frame_0016.png +3 -0
  38. demo_data/lamp/sim_tmp/frames/frame_0017.png +3 -0
  39. demo_data/lamp/sim_tmp/frames/frame_0018.png +3 -0
  40. demo_data/lamp/sim_tmp/frames/frame_0019.png +3 -0
  41. demo_data/lamp/sim_tmp/frames/frame_0020.png +3 -0
  42. demo_data/lamp/sim_tmp/frames/frame_0021.png +3 -0
  43. demo_data/lamp/sim_tmp/frames/frame_0022.png +3 -0
  44. demo_data/lamp/sim_tmp/frames/frame_0023.png +3 -0
  45. demo_data/lamp/sim_tmp/frames/frame_0024.png +3 -0
  46. demo_data/lamp/sim_tmp/frames/frame_0025.png +3 -0
  47. demo_data/lamp/sim_tmp/frames/frame_0026.png +3 -0
  48. demo_data/lamp/sim_tmp/frames/frame_0027.png +3 -0
  49. demo_data/lamp/sim_tmp/frames/frame_0028.png +3 -0
  50. demo_data/lamp/sim_tmp/frames/frame_0029.png +3 -0
.gitattributes ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ *.npy filter=lfs diff=lfs merge=lfs -text
2
+ *.obj filter=lfs diff=lfs merge=lfs -text
3
+ *.gif filter=lfs diff=lfs merge=lfs -text
4
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
5
+ *.png filter=lfs diff=lfs merge=lfs -text
6
+ *.jpg filter=lfs diff=lfs merge=lfs -text
7
+ *.jpeg filter=lfs diff=lfs merge=lfs -text
8
+ *.pt filter=lfs diff=lfs merge=lfs -text
9
+ *.pth filter=lfs diff=lfs merge=lfs -text
10
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,243 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RealWonder Interactive Demo
2
+
3
+ Interactive web demo for physics-guided video generation. Given a single image and a user-selected force direction, the system:
4
+
5
+ 1. Runs a real-time physics simulation (Genesis)
6
+ 2. Warps structured noise to follow the simulated motion
7
+ 3. Generates a realistic video using a causal diffusion model with SDEdit
8
+
9
+ ## Prerequisites
10
+
11
+ - A GPU with at least 40 GB VRAM (tested on H100 80 GB / 140 GB)
12
+ - Python 3.10
13
+ - PyTorch 2.1 + CUDA 12.1 (pre-installed in the environment)
14
+ - All packages listed in `requirements.txt`
15
+ - A model checkpoint (see your team's checkpoint storage)
16
+ - Preprocessed demo data placed in `demo_data/<case_name>/`
17
+
18
+ ## Setup
19
+
20
+ ### 1. Install dependencies
21
+
22
+ ```bash
23
+ pip install -r requirements.txt
24
+ ```
25
+
26
+ ### 2. Install pytorch3d
27
+
28
+ pytorch3d is not on standard PyPI. Install the wheel that matches your CUDA and PyTorch version:
29
+
30
+ ```bash
31
+ # Option A: Build from source (slow)
32
+ pip install "git+https://github.com/facebookresearch/pytorch3d.git"
33
+
34
+ # Option B: Pre-built wheel (fast, recommended)
35
+ # Find the matching wheel at https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/
36
+ # Example for PyTorch 2.1 + CUDA 12.1 + Python 3.10:
37
+ pip install --no-index --find-links \
38
+ https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu121_pyt210/ \
39
+ pytorch3d
40
+ ```
41
+
42
+ ### 3. Add demo data
43
+
44
+ Place preprocessed demo data under `demo_data/`. Each case is a subdirectory:
45
+
46
+ ```
47
+ demo_data/
48
+ lamp/
49
+ config.yaml # case config (num_output_frames, denoising_step_list, ...)
50
+ first_frame.png # 480x832 first frame image
51
+ fg_meshes/
52
+ mesh_00.obj # foreground object mesh(es)
53
+ fg_pcs/
54
+ pc_00.pt # foreground point cloud(s) (PyTorch tensors)
55
+ bg_points.pt # background point cloud
56
+ camera.pt # camera intrinsics K, extrinsics R/T, focal_length
57
+ fg_masks/
58
+ mask_00.png # foreground object mask(s) (optional, for UI)
59
+ ```
60
+
61
+ **Supported cases:** `lamp`, `persimmon`, `santa_cloth`, `tree`
62
+
63
+ The `config.yaml` for each case must contain at minimum:
64
+
65
+ ```yaml
66
+ example_name: "lamp" # must match a registered case name
67
+ material_type: ["rigid"] # physics material(s)
68
+ num_output_frames: 21 # number of latent frames to generate (must be divisible by 3)
69
+ denoising_step_list: [800, 600, 400, 200]
70
+ vgen_prompt: "A lamp swinging."
71
+ dt: 0.02
72
+ substeps: 10
73
+ frame_steps: 1
74
+ alpha_threshold: 0.5
75
+ ```
76
+
77
+ ## Running the Demo
78
+
79
+ ```bash
80
+ cd huggingface/
81
+ python app.py \
82
+ --demo_data demo_data/lamp \
83
+ --checkpoint_path /path/to/checkpoint.pt \
84
+ --port 5000 \
85
+ --no_debug \
86
+ --no_gpu_log
87
+ ```
88
+
89
+ Open `http://localhost:5000` in a browser. Choose a force direction, optionally edit the text prompt, then click **Start**.
90
+
91
+ ### CLI Arguments
92
+
93
+ | Argument | Default | Description |
94
+ |---|---|---|
95
+ | `--demo_data` | *(required)* | Path to demo data directory, e.g. `demo_data/lamp` |
96
+ | `--checkpoint_path` | *(required)* | Path to model `.pt` checkpoint |
97
+ | `--host` | `0.0.0.0` | Server bind address |
98
+ | `--port` | `5000` | Server port |
99
+ | `--use_ema` | off | Load EMA weights from checkpoint |
100
+ | `--seed` | `42` | Random seed |
101
+ | `--no_gpu_log` | off | Disable GPU memory logging |
102
+ | `--no_debug` | off | Force disable debug output (overrides `config.yaml`) |
103
+ | `--taehv` | off | Use TAEHV tiny VAE decoder (faster, slightly lower quality) |
104
+
105
+ ## Architecture
106
+
107
+ ### Startup (one-time, before first user request)
108
+
109
+ 1. Load and initialize the video generator (model + weights → GPU)
110
+ 2. Build the Genesis physics scene from the demo data meshes
111
+ 3. Pre-compute first-frame VAE + CLIP encoding, allocate KV cache, encode default prompt
112
+ 4. Warm up all CUDA kernels with dummy passes (~30s, eliminates JIT latency)
113
+
114
+ After startup, each **Start** click only triggers lightweight per-request preparation (~0.1s text re-encoding if the prompt changed).
115
+
116
+ ### 4-Stage Streaming Pipeline
117
+
118
+ Each generation runs a concurrent 4-stage pipeline. While the diffusion model denoises block N, noise warping processes block N+1, and simulation produces block N+2:
119
+
120
+ ```
121
+ Stage 1a (thread) Stage 1b (thread) Stage 2 (thread) Stage 3 (main) Stage 4 (thread)
122
+ Genesis physics → SVR render + → Noise warping → VAE encode + → Frame streaming
123
+ (per sim step) optical flow (structured noise) diffusion (SDEdit) JPEG → browser
124
+ (per pixel frame) (per block) (per block) FPS-paced
125
+ ```
126
+
127
+ All heavy GPU work (VAE encode + diffusion) runs in Stage 3 (main thread) to avoid GPU contention.
128
+
129
+ ### Key Parameters (`config.py`)
130
+
131
+ | Parameter | Value | Description |
132
+ |---|---|---|
133
+ | Resolution | 480 × 832 | Pixel output size |
134
+ | Latent size | 60 × 104 × 16 | After VAE encoding |
135
+ | Frames per block | 3 latent / 12 pixel | Causal generation unit |
136
+ | Default total | 21 latent / 81 pixel | 7 blocks × 3 frames |
137
+ | Temporal factor | 4 | VAE temporal downsampling |
138
+ | Playback FPS | 8 | Browser streaming rate |
139
+ | Noise channels | 32 | Structured + SDE noise |
140
+
141
+ ## File Structure
142
+
143
+ ```
144
+ huggingface/
145
+ ├── app.py # Flask + SocketIO web server (entry point)
146
+ ├── config.py # Pipeline constants
147
+ ├── simulation_engine.py # Genesis simulation wrapper (InteractiveSimulator)
148
+ ├── noise_warper_stream.py # Incremental noise warping (StreamingNoiseWarper)
149
+ ├── video_generator.py # Block-by-block diffusion (StreamingVideoGenerator)
150
+ ├── gpu_profiler.py # GPU memory logging utility
151
+ ├── taehv.py # Tiny AutoEncoder for fast VAE decoding (optional)
152
+ ├── requirements.txt # pip dependencies
153
+
154
+ ├── vidgen/ # Internal video generation model library (bundled)
155
+ ├── wan/ # Internal model modules — WanModel, VAE, tokenizers (bundled)
156
+
157
+ ├── case_handlers/ # Per-case UI config and force application (web demo)
158
+ │ ├── base.py # DemoCaseHandler base class + registry
159
+ │ ├── lamp.py
160
+ │ ├── persimmon.py
161
+ │ ├── santa_cloth.py
162
+ │ └── tree.py
163
+
164
+ ├── simulation/
165
+ │ ├── utils.py # Coordinate transforms, resize, save utilities
166
+ │ ├── case_simulation/ # Per-case Genesis physics handlers
167
+ │ │ ├── case_handler.py # CaseHandler ABC + registry
168
+ │ │ ├── lamp.py
169
+ │ │ ├── persimmon.py
170
+ │ │ ├── santa_cloth.py
171
+ │ │ └── tree.py
172
+ │ └── image23D/
173
+ │ └── noise_warp/
174
+ │ └── noise_warp.py # NoiseWarper (particle-swarm noise warping)
175
+
176
+ ├── templates/
177
+ │ └── index.html # Web UI
178
+ ├── static/
179
+ │ ├── app.js # SocketIO client
180
+ │ └── style.css
181
+
182
+ └── demo_data/ # Preprocessed cases (add your data here)
183
+ └── <case_name>/
184
+ ├── config.yaml
185
+ ├── first_frame.png
186
+ ├── fg_meshes/
187
+ ├── fg_pcs/
188
+ ├── bg_points.pt
189
+ ├── camera.pt
190
+ └── fg_masks/
191
+ ```
192
+
193
+ ## Package Dependencies
194
+
195
+ ### Standard library
196
+ `abc`, `argparse`, `base64`, `collections`, `glob`, `io`, `math`, `os`, `pathlib`, `queue`, `sys`, `threading`, `time`, `traceback`, `typing`, `urllib`
197
+
198
+ ### PyPI (installed via `requirements.txt`)
199
+
200
+ | Package | PyPI name | Purpose |
201
+ |---|---|---|
202
+ | PyTorch | `torch` | Core ML framework |
203
+ | TorchVision | `torchvision` | Video save, image transforms |
204
+ | NumPy | `numpy` | Array operations |
205
+ | Pillow | `Pillow` | Image I/O |
206
+ | Flask | `flask` | Web server |
207
+ | Flask-SocketIO | `flask-socketio` | Real-time frame streaming |
208
+ | OpenCV | `opencv-python` | Flow resize, HSV colormap |
209
+ | Einops | `einops` | Tensor reshaping |
210
+ | OmegaConf | `omegaconf` | Config loading |
211
+ | PEFT | `peft` | LoRA / parameter-efficient fine-tuning |
212
+ | Safetensors | `safetensors` | Checkpoint loading |
213
+ | Diffusers | `diffusers` | Scheduler utilities |
214
+ | Transformers | `transformers` | CLIP text encoder, tokenizer |
215
+ | ftfy | `ftfy` | Text normalization for CLIP |
216
+ | EasyDict | `easydict` | Attribute-access dicts |
217
+ | SciPy | `scipy` | Rotation utilities |
218
+ | ImageIO | `imageio` | GIF saving |
219
+ | Trimesh | `trimesh` | Mesh loading/export |
220
+ | Matplotlib | `matplotlib` | Optical flow debug viz |
221
+ | tqdm | `tqdm` | TAEHV progress bars |
222
+ | PyYAML | `PyYAML` | Case config parsing |
223
+ | rp | `rp` | Noise warp image utilities |
224
+ | Genesis | `genesis-world` | Physics simulation |
225
+
226
+ ### Manual installs
227
+
228
+ | Package | Notes |
229
+ |---|---|
230
+ | `pytorch3d` | Requires wheel matching CUDA/PyTorch version. See [install guide](https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md). |
231
+ | `gstaichi` | Bundled with `genesis-world`. |
232
+
233
+ ## Debug Mode
234
+
235
+ Set `debug: true` in `demo_data/<case>/config.yaml` to save intermediate outputs to `demo_data/<case>/sim_tmp/`:
236
+
237
+ - `gs_frames/` — Genesis camera renders (per sim step)
238
+ - `frames/` — SVR point-cloud renders (per pixel frame)
239
+ - `masks/` — Foreground and mesh masks
240
+ - `optical_flow/` — Optical flow HSV visualizations
241
+ - `noises.npy` / `noise_video.mp4` — Warped noise (latent resolution)
242
+
243
+ Pass `--no_debug` on the command line to force-disable all debug saves regardless of `config.yaml`.
app.py ADDED
@@ -0,0 +1,786 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Flask + SocketIO server for the RealWonder interactive demo.
2
+
3
+ Usage:
4
+ python app.py \
5
+ --demo_data demo_data/lamp \
6
+ --checkpoint_path /path/to/model.pt \
7
+ --port 5000
8
+
9
+ The specified --demo_data case is fully initialized at startup (Genesis scene,
10
+ video generator, noise warper). When a client connects, the UI shows the scene
11
+ preview and lets the user choose force direction, edit prompt, and click Start.
12
+ """
13
+ import os
14
+ os.environ['SETUPTOOLS_USE_DISTUTILS'] = 'stdlib'
15
+
16
+ import argparse
17
+ import base64
18
+ import io
19
+ import threading
20
+ from pathlib import Path
21
+ from queue import Queue, Full as QueueFull, Empty as QueueEmpty
22
+
23
+ import numpy as np
24
+ import torch
25
+ import torch.nn.functional as F
26
+ from PIL import Image
27
+ from flask import Flask, render_template
28
+ from flask_socketio import SocketIO, emit
29
+
30
+ from config import (
31
+ FRAMES_PER_BLOCK, FRAMES_PER_BLOCK_PIXEL, FRAMES_FIRST_BLOCK_PIXEL,
32
+ FPS, LATENT_H, LATENT_W, LATENT_C,
33
+ DEFAULT_HEIGHT, DEFAULT_WIDTH, TEMPORAL_FACTOR,
34
+ load_case_sdedit_config,
35
+ )
36
+ from simulation_engine import InteractiveSimulator
37
+ from noise_warper_stream import StreamingNoiseWarper
38
+ from video_generator import StreamingVideoGenerator
39
+ from case_handlers.base import get_demo_case_handler
40
+ import case_handlers # trigger registration
41
+ from gpu_profiler import log_gpu, set_gpu_logging
42
+ from simulation.utils import resize_and_crop_pil
43
+
44
+ app = Flask(__name__)
45
+ app.config["SECRET_KEY"] = "realwonder-demo"
46
+ socketio = SocketIO(app, cors_allowed_origins="*", async_mode="threading")
47
+
48
+ # Global state — all initialized at startup before the server accepts connections
49
+ simulator = None
50
+ noise_warper = None
51
+ generator = None
52
+ demo_case_handler = None # Per-case UI/force handler
53
+ preview_b64 = None # Base64 scene preview rendered once at startup
54
+ default_prompt = "" # Prompt from case config
55
+ case_name = "" # Name of the loaded case
56
+ num_blocks = None # Computed from case config at startup
57
+
58
+ is_generating = False
59
+ stop_requested = False
60
+
61
+
62
+ @app.route("/")
63
+ def index():
64
+ return render_template("index.html")
65
+
66
+
67
+ @socketio.on("connect")
68
+ def on_connect():
69
+ """When a client connects, send the pre-rendered scene preview and config."""
70
+ print("Client connected")
71
+ if simulator is not None and preview_b64 is not None:
72
+ ui_config = demo_case_handler.get_ui_config() if demo_case_handler else {}
73
+ ui_config["allow_change_force"] = simulator.config.get("allow_change_force", False)
74
+ emit("ready", {
75
+ "case_name": case_name,
76
+ "preview": preview_b64,
77
+ "prompt": default_prompt,
78
+ "ui_config": ui_config,
79
+ })
80
+ else:
81
+ emit("error", {"message": "Server not fully initialized. Check startup logs."})
82
+
83
+
84
+ @socketio.on("start_generation")
85
+ def on_start_generation(data):
86
+ """User chose direction + prompt and clicked Start."""
87
+ global is_generating, stop_requested
88
+ if simulator is None:
89
+ emit("error", {"message": "Simulator not initialized"})
90
+ return
91
+ if generator is None or not generator.is_setup:
92
+ emit("error", {"message": "Video generator not initialized"})
93
+ return
94
+ if is_generating:
95
+ emit("error", {"message": "Generation already in progress"})
96
+ return
97
+
98
+ prompt = data.get("prompt", default_prompt or "A video of physical simulation")
99
+ ui_forces = data.get("forces", [])
100
+
101
+ # Convert UI direction strings to 3D vectors and store on handler
102
+ force_configs = demo_case_handler.get_force_config_from_ui(ui_forces)
103
+ demo_case_handler.set_forces(force_configs)
104
+
105
+ # Configure simulation state from the main thread (required for cases
106
+ # like santa_cloth where taichi field writes need the creating thread's
107
+ # CUDA context).
108
+ demo_case_handler.configure_simulation(simulator)
109
+
110
+ emit("status", {"message": "Forces configured. Starting generation..."})
111
+ stop_requested = False
112
+ socketio.start_background_task(generation_loop, prompt)
113
+
114
+
115
+ @socketio.on("stop_generation")
116
+ def on_stop_generation():
117
+ global stop_requested
118
+ stop_requested = True
119
+
120
+
121
+ @socketio.on("update_forces")
122
+ def on_update_forces(data):
123
+ """User changed force direction/strength mid-generation.
124
+
125
+ Updates the demo handler's wind parameters (plain Python attrs).
126
+ The simulation thread's apply_forces() reads these every step,
127
+ so changes take effect immediately — no CUDA or taichi involved.
128
+ Only works when allow_change_force is enabled in the case config.
129
+ """
130
+ if demo_case_handler is None or simulator is None:
131
+ return
132
+ if not simulator.config.get("allow_change_force", False):
133
+ return
134
+ ui_forces = data.get("forces", [])
135
+ force_configs = demo_case_handler.get_force_config_from_ui(ui_forces)
136
+ demo_case_handler.set_forces(force_configs)
137
+ # Update derived wind params (direction vector, strength scalar)
138
+ demo_case_handler.configure_simulation(simulator)
139
+
140
+
141
+ @socketio.on("reset")
142
+ def on_reset():
143
+ global is_generating, stop_requested
144
+ stop_requested = True
145
+ if simulator is not None:
146
+ simulator.reset()
147
+ if noise_warper is not None:
148
+ noise_warper.reset()
149
+ if generator is not None:
150
+ generator.reset()
151
+ is_generating = False
152
+ socketio.emit("status", {"message": "Reset complete"})
153
+ # Re-send the preview so user can start again
154
+ if preview_b64 is not None:
155
+ ui_config = demo_case_handler.get_ui_config() if demo_case_handler else {}
156
+ ui_config["allow_change_force"] = simulator.config.get("allow_change_force", False) if simulator else False
157
+ socketio.emit("ready", {
158
+ "case_name": case_name,
159
+ "preview": preview_b64,
160
+ "prompt": default_prompt,
161
+ "ui_config": ui_config,
162
+ })
163
+
164
+
165
+ def generation_loop(prompt):
166
+ """Main generation loop with 3-stage streaming pipeline.
167
+
168
+ Stage 1 (thread): Simulation — produces RGB frames + optical flows per block
169
+ Stage 2 (thread): Noise warping — warps noise using optical flow (lightweight)
170
+ Stage 3 (main): VAE encoding + mask building + diffusion denoising + streaming
171
+
172
+ Each stage runs concurrently: while VGen denoises block N, noise warping
173
+ handles block N+1, and simulation produces block N+2. All heavy GPU work
174
+ (VAE encode + diffusion) is consolidated in Stage 3 to avoid GPU memory
175
+ contention.
176
+ """
177
+ global is_generating, stop_requested
178
+ is_generating = True
179
+ torch.set_grad_enabled(False) # thread-local: must set in this thread too
180
+
181
+ try:
182
+ socketio.emit("status", {"message": "Preparing video generator..."})
183
+
184
+ # Reset noise warper before sim threads start.
185
+ noise_warper.reset()
186
+
187
+ frame_steps = simulator.frame_steps
188
+
189
+ # --- 4-Stage Pipeline Queues ---
190
+ physics_queue = Queue(maxsize=2) # Stage 1a → Stage 1b (per pixel frame)
191
+ sim_queue = Queue(maxsize=2) # Stage 1b → Stage 2 (per block)
192
+ ready_queue = Queue(maxsize=3) # Stage 2 → Stage 3
193
+ is_debug = simulator.config.get("debug", False)
194
+ all_sim_frames = [] if is_debug else None
195
+
196
+ # --- Stage 1a: Physics producer ---
197
+ # Runs Genesis physics steps and puts per-frame point clouds into
198
+ # physics_queue. Does NOT touch the SVR renderer, so it can run
199
+ # ahead of Stage 1b by up to physics_queue.maxsize frames.
200
+ def physics_producer():
201
+ import time
202
+ try:
203
+ for block_idx in range(num_blocks):
204
+ if stop_requested:
205
+ break
206
+ n_pixel = FRAMES_FIRST_BLOCK_PIXEL if block_idx == 0 else FRAMES_PER_BLOCK_PIXEL
207
+ for pf_idx in range(n_pixel):
208
+ if stop_requested:
209
+ break
210
+ t0 = time.perf_counter()
211
+ last_i = frame_steps - 1
212
+ for i in range(frame_steps):
213
+ updated_points = simulator.step(extract_points=(i == last_i))
214
+ t_step = time.perf_counter() - t0
215
+ # Capture frame_id here: render thread may be behind
216
+ frame_id = simulator.step_count
217
+ item = (block_idx, n_pixel, pf_idx,
218
+ updated_points, frame_id, t_step)
219
+ # Timed put so stop_requested is checked if render stops consuming
220
+ while not stop_requested:
221
+ try:
222
+ physics_queue.put(item, timeout=0.5)
223
+ break
224
+ except QueueFull:
225
+ pass
226
+ except Exception as e:
227
+ import traceback
228
+ traceback.print_exc()
229
+ finally:
230
+ # Best-effort sentinel — render exits via stop_requested if queue stays full
231
+ for _ in range(20): # up to 10 s
232
+ try:
233
+ physics_queue.put(None, timeout=0.5)
234
+ break
235
+ except QueueFull:
236
+ pass
237
+
238
+ # --- Stage 1b: Render + flow producer ---
239
+ # Reads point clouds from physics_queue, runs SVR render + optical
240
+ # flow + resize, accumulates per-block results, then forwards complete
241
+ # blocks to sim_queue (same interface as the old simulation_producer).
242
+ def render_flow_producer():
243
+ import time
244
+ try:
245
+ current_block = -1
246
+ flows, sim_frames, fg_masks, mesh_masks = [], [], [], []
247
+ t_block_start = time.perf_counter()
248
+ t_step_total = t_render_total = t_resize_total = 0.0
249
+
250
+ while not stop_requested:
251
+ try:
252
+ item = physics_queue.get(timeout=0.5)
253
+ except QueueEmpty:
254
+ continue
255
+ if item is None:
256
+ break
257
+
258
+ block_idx, n_pixel, pf_idx, updated_points, frame_id, t_step = item
259
+
260
+ if block_idx != current_block:
261
+ current_block = block_idx
262
+ flows, sim_frames, fg_masks, mesh_masks = [], [], [], []
263
+ t_block_start = time.perf_counter()
264
+ t_step_total = t_render_total = t_resize_total = 0.0
265
+
266
+ t0 = time.perf_counter()
267
+ frame_pil, flow_2hw, fg_mask, mesh_mask = (
268
+ simulator.render_and_flow(updated_points, frame_id=frame_id)
269
+ )
270
+ t1 = time.perf_counter()
271
+ frame_pil = resize_and_crop_pil(frame_pil, start_y=simulator.crop_start)
272
+ t2 = time.perf_counter()
273
+
274
+ sim_frames.append(frame_pil)
275
+ flows.append(flow_2hw)
276
+ fg_masks.append(fg_mask)
277
+ mesh_masks.append(mesh_mask)
278
+
279
+ t_step_total += t_step
280
+ t_render_total += t1 - t0
281
+ t_resize_total += t2 - t1
282
+
283
+ if len(sim_frames) == n_pixel:
284
+ t_queue_start = time.perf_counter()
285
+ if all_sim_frames is not None:
286
+ all_sim_frames.extend(sim_frames)
287
+ sim_queue.put((block_idx, flows, sim_frames, fg_masks, mesh_masks))
288
+ t_queue_end = time.perf_counter()
289
+ print(f"[TIMING] sim block {block_idx}: "
290
+ f"physics step = {t_step_total:.3f}s, "
291
+ f"render+flow = {t_render_total:.3f}s, "
292
+ f"resize = {t_resize_total:.3f}s, "
293
+ f"queue put = {t_queue_end - t_queue_start:.3f}s, "
294
+ f"total = {t_queue_end - t_block_start:.3f}s "
295
+ f"({n_pixel} frames)")
296
+ except Exception as e:
297
+ import traceback
298
+ traceback.print_exc()
299
+ finally:
300
+ sim_queue.put(None) # Sentinel
301
+
302
+ # --- Stage 2: Noise Warping (lightweight, mostly CPU) ---
303
+ def noise_warp_stage():
304
+ import time
305
+ try:
306
+ while not stop_requested:
307
+ t_wait_start = time.perf_counter()
308
+ item = sim_queue.get()
309
+ t_wait_end = time.perf_counter()
310
+ if item is None:
311
+ break
312
+
313
+ block_idx, flows, sim_frames, fg_masks, mesh_masks = item
314
+
315
+ # Warp noise incrementally using optical flow
316
+ t0 = time.perf_counter()
317
+ for flow in flows:
318
+ noise_warper.warp_step(flow)
319
+ t1 = time.perf_counter()
320
+ structured_noise, sde_noise = noise_warper.get_block_noise(block_idx)
321
+ t2 = time.perf_counter()
322
+
323
+ ready_queue.put((
324
+ block_idx,
325
+ structured_noise,
326
+ sde_noise,
327
+ sim_frames, fg_masks, mesh_masks,
328
+ ))
329
+ t3 = time.perf_counter()
330
+
331
+ print(f"[TIMING] warp block {block_idx}: "
332
+ f"queue wait = {t_wait_end - t_wait_start:.3f}s, "
333
+ f"warp steps = {t1 - t0:.3f}s, "
334
+ f"get_block_noise = {t2 - t1:.3f}s, "
335
+ f"queue put = {t3 - t2:.3f}s, "
336
+ f"total = {t3 - t_wait_end:.3f}s")
337
+ except Exception as e:
338
+ import traceback
339
+ traceback.print_exc()
340
+ finally:
341
+ ready_queue.put(None) # Sentinel
342
+
343
+ # Start stages 1a, 1b, and 2 BEFORE prepare_generation so the
344
+ # simulation pipeline (physics → render → warp) runs in parallel
345
+ # with text encoding. By the time prepare_generation() returns,
346
+ # ready_queue may already contain block 0, eliminating the startup gap.
347
+ physics_thread = threading.Thread(target=physics_producer, daemon=True)
348
+ render_thread = threading.Thread(target=render_flow_producer, daemon=True)
349
+ warp_thread = threading.Thread(target=noise_warp_stage, daemon=True)
350
+ physics_thread.start()
351
+ render_thread.start()
352
+ warp_thread.start()
353
+
354
+ # Text encoding (+ conditional dict setup) runs while sim pipeline
355
+ # is already producing frames.
356
+ generator.prepare_generation(prompt)
357
+
358
+ # --- Stage 3: VAE Encode + Mask Build + Diffusion ---
359
+ # --- Stage 4: Frame streaming (separate thread, runs concurrently) ---
360
+ import time
361
+ stream_queue = Queue(maxsize=2) # Stage 3 → Stage 4
362
+
363
+ def frame_streamer():
364
+ """Stream frames to browser at FPS rate, decoupled from GPU work."""
365
+ try:
366
+ while not stop_requested:
367
+ item = stream_queue.get()
368
+ if item is None:
369
+ break
370
+ pixel_frames, blk_idx = item
371
+ for frame in pixel_frames:
372
+ if stop_requested:
373
+ break
374
+ b64 = base64.b64encode(_encode_jpeg(frame)).decode("ascii")
375
+ socketio.emit("frame", {"data": b64, "block": blk_idx})
376
+ socketio.sleep(1.0 / FPS)
377
+ except Exception as e:
378
+ import traceback
379
+ traceback.print_exc()
380
+
381
+ stream_thread = threading.Thread(target=frame_streamer, daemon=True)
382
+ stream_thread.start()
383
+
384
+ t_block_end = time.perf_counter()
385
+
386
+ while not stop_requested:
387
+ t_wait_start = time.perf_counter()
388
+ item = ready_queue.get()
389
+ t_wait_end = time.perf_counter()
390
+ if item is None:
391
+ break
392
+
393
+ (block_idx, structured_noise, sde_noise,
394
+ sim_frames, fg_masks, mesh_masks) = item
395
+
396
+ print(f"[TIMING] block {block_idx}: queue wait = {t_wait_end - t_wait_start:.3f}s, "
397
+ f"gap since prev block end = {t_wait_end - t_block_end:.3f}s")
398
+
399
+ socketio.emit("status", {
400
+ "message": f"Block {block_idx + 1}/{num_blocks} — Generating...",
401
+ "block": block_idx,
402
+ "total_blocks": num_blocks,
403
+ })
404
+
405
+ # 1. Encode simulation frames to latent (GPU)
406
+ t0 = time.perf_counter()
407
+ log_gpu(f"stage3 block {block_idx}: before VAE encode")
408
+ sim_frames_tensor = _frames_to_tensor(sim_frames)
409
+ sim_latent = generator.pipeline.encode_vae.cached_encode_to_latent(
410
+ sim_frames_tensor.to(device=generator.device, dtype=torch.bfloat16),
411
+ is_first=(block_idx == 0),
412
+ )
413
+ if sim_latent.shape[1] > FRAMES_PER_BLOCK:
414
+ sim_latent = sim_latent[:, :FRAMES_PER_BLOCK]
415
+ elif sim_latent.shape[1] < FRAMES_PER_BLOCK:
416
+ pad = FRAMES_PER_BLOCK - sim_latent.shape[1]
417
+ sim_latent = torch.cat(
418
+ [sim_latent, sim_latent[:, -1:].repeat(1, pad, 1, 1, 1)], dim=1,
419
+ )
420
+ t1 = time.perf_counter()
421
+ log_gpu(f"stage3 block {block_idx}: after VAE encode")
422
+
423
+ # 2. Build masks
424
+ sim_mask = _downsample_masks(fg_masks, FRAMES_PER_BLOCK, crop_start=simulator.crop_start, device=generator.device)
425
+ sim_franka_mask = _downsample_masks(mesh_masks, FRAMES_PER_BLOCK, crop_start=simulator.crop_start, device=generator.device)
426
+ t2 = time.perf_counter()
427
+ log_gpu(f"stage3 block {block_idx}: after mask build")
428
+
429
+ # 3. Diffusion denoising
430
+ pixel_frames = generator.generate_block(
431
+ block_idx=block_idx,
432
+ structured_noise=structured_noise,
433
+ sim_latent=sim_latent,
434
+ sde_noise=sde_noise,
435
+ sim_mask=sim_mask,
436
+ sim_franka_mask=sim_franka_mask,
437
+ )
438
+ t3 = time.perf_counter()
439
+
440
+ # Hand off frames to streaming thread (non-blocking)
441
+ stream_queue.put((pixel_frames, block_idx))
442
+
443
+ print(f"[TIMING] block {block_idx}: VAE encode = {t1 - t0:.3f}s, "
444
+ f"mask build = {t2 - t1:.3f}s, diffusion = {t3 - t2:.3f}s, "
445
+ f"total = {t3 - t_wait_end:.3f}s")
446
+ t_block_end = t3
447
+
448
+ stream_queue.put(None) # Sentinel
449
+ physics_thread.join(timeout=10)
450
+ render_thread.join(timeout=10)
451
+ warp_thread.join(timeout=10)
452
+ stream_thread.join(timeout=30)
453
+
454
+ # Save debug outputs only if debug mode is on
455
+ if simulator.config.get("debug", False):
456
+ if noise_warper.noise_buffer:
457
+ noise_stack = torch.stack(noise_warper.noise_buffer, dim=0) # (T, C, H, W)
458
+ downscale_factor = DEFAULT_HEIGHT // LATENT_H # 480 // 60 = 8
459
+ noise_latent = F.interpolate(
460
+ noise_stack, size=(LATENT_H, LATENT_W), mode="area",
461
+ ) * downscale_factor # (T, 32, 60, 104)
462
+ numpy_noises = noise_latent.cpu().permute(0, 2, 3, 1).numpy().astype(np.float16) # (T, H, W, C)
463
+
464
+ debug_dir = Path(simulator.config.get("output_folder", "/tmp/demo_debug"))
465
+ debug_dir.mkdir(parents=True, exist_ok=True)
466
+
467
+ noises_path = debug_dir / "noises.npy"
468
+ np.save(noises_path, numpy_noises)
469
+
470
+ noise_vis = np.clip(numpy_noises[:, :, :, :3].astype(np.float32) / 4 + 0.5, 0, 1)
471
+ noise_vis = (noise_vis * 255).astype(np.uint8)
472
+ noise_video_tensor = torch.from_numpy(noise_vis) # (T, H, W, 3) uint8
473
+ from torchvision.io import write_video
474
+ noise_mp4_path = str(debug_dir / "noise_video.mp4")
475
+ write_video(noise_mp4_path, noise_video_tensor, fps=30, video_codec="libx264")
476
+ print(f"Noise saved to: {noises_path} video: {noise_mp4_path}")
477
+
478
+ simulator.save_debug_outputs(sim_frames=all_sim_frames)
479
+
480
+ socketio.emit("generation_complete", {})
481
+ socketio.emit("status", {"message": "Generation complete"})
482
+
483
+ except Exception as e:
484
+ socketio.emit("error", {"message": f"Generation error: {str(e)}"})
485
+ import traceback
486
+ traceback.print_exc()
487
+ finally:
488
+ is_generating = False
489
+
490
+
491
+ # ---------------------------------------------------------------------------
492
+ # Helpers
493
+ # ---------------------------------------------------------------------------
494
+
495
+ def _find_first_frame():
496
+ """Locate the first-frame image for video generation."""
497
+ case_path = simulator.demo_data_path
498
+ candidate = case_path / "first_frame.png"
499
+ if candidate.exists():
500
+ return str(candidate)
501
+ input_path = Path(simulator.config.get("data_path", "")) / "input.png"
502
+ if input_path.exists():
503
+ return str(input_path)
504
+ return str(candidate) # fallback, may error later with clear message
505
+
506
+
507
+ def _frames_to_tensor(frames_pil):
508
+ """Convert list of PIL frames (already 480x832) to tensor [1, C, T, H, W] in [-1, 1]."""
509
+ arrays = []
510
+ for f in frames_pil:
511
+ arr = np.array(f.convert("RGB"))
512
+ arr = arr.astype(np.float32) / 127.5 - 1.0
513
+ arrays.append(torch.from_numpy(arr))
514
+ tensor = torch.stack(arrays, dim=0).permute(3, 0, 1, 2).contiguous()
515
+ return tensor.unsqueeze(0)
516
+
517
+
518
+ def _downsample_masks(masks, target_frames, crop_start=176, device="cuda"):
519
+ """Downsample list of mask tensors to target_frames latent frames."""
520
+ if not masks or all(m is None for m in masks):
521
+ return None
522
+
523
+ processed = []
524
+ for m in masks:
525
+ if m is None:
526
+ processed.append(torch.zeros(1, 1, LATENT_H, LATENT_W, device=device))
527
+ continue
528
+ if isinstance(m, torch.Tensor):
529
+ m = m.to(device=device)
530
+ if m.dim() == 3:
531
+ m = m.squeeze(-1)
532
+ m_832 = F.interpolate(
533
+ m.float().unsqueeze(0).unsqueeze(0),
534
+ size=(832, 832), mode="bilinear", align_corners=False,
535
+ )
536
+ m_cropped = m_832[:, :, crop_start:crop_start + DEFAULT_HEIGHT, :]
537
+ m_latent = F.interpolate(
538
+ m_cropped, size=(LATENT_H, LATENT_W),
539
+ mode="bilinear", align_corners=False,
540
+ )
541
+ processed.append(m_latent)
542
+ else:
543
+ processed.append(torch.zeros(1, 1, LATENT_H, LATENT_W, device=device))
544
+
545
+ stacked = torch.cat(processed, dim=0)
546
+ T = stacked.shape[0]
547
+
548
+ time_averaged = []
549
+ for i in range(0, T, TEMPORAL_FACTOR):
550
+ group = stacked[i:i + TEMPORAL_FACTOR]
551
+ time_averaged.append(group.mean(dim=0, keepdim=True))
552
+ stacked = torch.cat(time_averaged, dim=0)
553
+
554
+ if stacked.shape[0] > target_frames:
555
+ stacked = stacked[:target_frames]
556
+ elif stacked.shape[0] < target_frames:
557
+ pad = target_frames - stacked.shape[0]
558
+ stacked = torch.cat(
559
+ [stacked, stacked[-1:].repeat(pad, 1, 1, 1)], dim=0,
560
+ )
561
+
562
+ result = stacked.squeeze(1).unsqueeze(0)
563
+ return (result > 0.5).bool()
564
+
565
+
566
+ def _encode_jpeg(frame_np, quality=85):
567
+ img = Image.fromarray(frame_np)
568
+ buf = io.BytesIO()
569
+ img.save(buf, format="JPEG", quality=quality)
570
+ return buf.getvalue()
571
+
572
+
573
+ def _encode_pil_b64(pil_img, fmt="JPEG", quality=85):
574
+ buf = io.BytesIO()
575
+ pil_img.save(buf, format=fmt, quality=quality)
576
+ return base64.b64encode(buf.getvalue()).decode("ascii")
577
+
578
+
579
+ # ---------------------------------------------------------------------------
580
+ # Pipeline warmup — compile CUDA kernels before first user request
581
+ # ---------------------------------------------------------------------------
582
+
583
+ def _warmup_pipeline():
584
+ """Run dummy passes through each pipeline stage to trigger CUDA JIT.
585
+
586
+ Without this, the first user-facing generation pays ~24s of kernel
587
+ compilation across simulation render, noise warping, and diffusion.
588
+ """
589
+ import time
590
+ print("[4/5] Warming up CUDA kernels (one-time cost)...")
591
+ torch.set_grad_enabled(False)
592
+
593
+ # 1. Warm up simulation render + optical flow
594
+ t0 = time.perf_counter()
595
+ for _pass in range(2):
596
+ for _ in range(simulator.frame_steps):
597
+ updated_points = simulator.step()
598
+ simulator.render_and_flow(updated_points)
599
+
600
+ # Reset simulation state (scene.reset restores to built state)
601
+ simulator.scene.reset()
602
+ simulator.case_handler.fix_particles() # re-pin after reset
603
+ simulator.step_count = 0
604
+ simulator.svr.previous_frame_data = None
605
+ simulator.svr.optical_flow = np.array([])
606
+ simulator.svr._last_optical_flow = None
607
+ simulator.svr._prev_fg_frags_idx = None
608
+ simulator.svr._prev_fg_frags_dists = None
609
+ # Keep cache_bg — background render is reusable
610
+ t1 = time.perf_counter()
611
+ print(f" Sim + render warmup: {t1 - t0:.1f}s")
612
+
613
+ # 2. Warm up noise warper (grid_sample, meshgrid, interpolate kernels)
614
+ dummy_flow = np.zeros((2, 512, 512), dtype=np.float32)
615
+ noise_warper.warp_step(dummy_flow)
616
+ noise_warper.reset()
617
+ t2 = time.perf_counter()
618
+ print(f" Noise warp warmup: {t2 - t1:.1f}s")
619
+
620
+ # 3. Warm up VAE encode + diffusion (transformer attention kernels)
621
+ generator.prepare_generation(default_prompt)
622
+
623
+ # Dummy VAE encode
624
+ dummy_pixel = torch.zeros(
625
+ 1, 3, FRAMES_FIRST_BLOCK_PIXEL, DEFAULT_HEIGHT, DEFAULT_WIDTH,
626
+ device=generator.device, dtype=torch.bfloat16,
627
+ )
628
+ sim_latent = generator.pipeline.encode_vae.cached_encode_to_latent(
629
+ dummy_pixel, is_first=True,
630
+ )
631
+ if sim_latent.shape[1] > FRAMES_PER_BLOCK:
632
+ sim_latent = sim_latent[:, :FRAMES_PER_BLOCK]
633
+ elif sim_latent.shape[1] < FRAMES_PER_BLOCK:
634
+ pad = FRAMES_PER_BLOCK - sim_latent.shape[1]
635
+ sim_latent = torch.cat(
636
+ [sim_latent, sim_latent[:, -1:].repeat(1, pad, 1, 1, 1)], dim=1,
637
+ )
638
+
639
+ # Dummy diffusion block
640
+ dummy_noise = torch.randn(
641
+ 1, FRAMES_PER_BLOCK, LATENT_C, LATENT_H, LATENT_W,
642
+ device=generator.device, dtype=torch.bfloat16,
643
+ )
644
+ generator.generate_block(
645
+ block_idx=0,
646
+ structured_noise=dummy_noise,
647
+ sim_latent=sim_latent,
648
+ )
649
+
650
+ # Run two more dummy blocks to warm up the KV-cache-populated code
651
+ # paths (blocks 1+ are structurally different from block 0 because the
652
+ # self-attention KV cache is non-empty). Without this, real generation
653
+ # blocks 0 and 1 hit slow cuDNN algorithm selection on first use, taking
654
+ # ~4s each instead of ~1s. The crossattn_cache stays valid across these
655
+ # extra blocks (same prompt), so they run fast (~1s each).
656
+ for _blk in range(1, 3):
657
+ _dummy_latent = torch.zeros(
658
+ 1, FRAMES_PER_BLOCK, LATENT_C, LATENT_H, LATENT_W,
659
+ device=generator.device, dtype=torch.bfloat16,
660
+ )
661
+ _dummy_noise = torch.randn_like(_dummy_latent)
662
+ generator.generate_block(
663
+ block_idx=_blk,
664
+ structured_noise=_dummy_noise,
665
+ sim_latent=_dummy_latent,
666
+ )
667
+
668
+ # Reset generator state (KV self-attention cache + VAE caches).
669
+ # crossattn_cache is intentionally preserved: it is text-conditioned
670
+ # and stays valid for the default prompt, so real generation blocks 0
671
+ # and 1 skip the expensive cold re-initialization.
672
+ generator.reset()
673
+ generator.pipeline.encode_vae.model.clear_cache()
674
+ t3 = time.perf_counter()
675
+ print(f" VAE + diffusion warmup: {t3 - t2:.1f}s")
676
+ print(f" Total warmup: {t3 - t0:.1f}s — first generation will be fast.")
677
+ log_gpu("after pipeline warmup")
678
+
679
+
680
+ # ---------------------------------------------------------------------------
681
+ # Startup
682
+ # ---------------------------------------------------------------------------
683
+
684
+ def main():
685
+ global simulator, noise_warper, generator, demo_case_handler
686
+ global preview_b64, default_prompt, case_name, num_blocks
687
+
688
+ parser = argparse.ArgumentParser(description="RealWonder Interactive Demo")
689
+ parser.add_argument("--demo_data", type=str, required=True,
690
+ help="Path to demo data directory (e.g. demo_data/lamp)")
691
+ parser.add_argument("--checkpoint_path", type=str, required=True,
692
+ help="Path to video generation model checkpoint")
693
+ parser.add_argument("--host", type=str, default="0.0.0.0")
694
+ parser.add_argument("--port", type=int, default=5000)
695
+ parser.add_argument("--use_ema", action="store_true")
696
+ parser.add_argument("--seed", type=int, default=42)
697
+ parser.add_argument("--no_gpu_log", action="store_true",
698
+ help="Disable GPU memory logging")
699
+ parser.add_argument("--no_debug", action="store_true",
700
+ help="Force disable debug outputs (overrides config.yaml)")
701
+ parser.add_argument("--taehv", action="store_true",
702
+ help="Use TAEHV tiny VAE decoder (faster but slightly lower quality)")
703
+ args = parser.parse_args()
704
+
705
+ if args.no_gpu_log:
706
+ set_gpu_logging(False)
707
+
708
+ demo_data_path = Path(args.demo_data)
709
+ case_name = demo_data_path.name
710
+
711
+ if not demo_data_path.exists() or not (demo_data_path / "config.yaml").exists():
712
+ print(f"ERROR: {demo_data_path} does not exist or has no config.yaml")
713
+ return
714
+
715
+ # ---- Load case config and derive SDEdit parameters ----
716
+ import yaml
717
+ with open(demo_data_path / "config.yaml") as f:
718
+ case_config = yaml.safe_load(f)
719
+ sdedit_cfg = load_case_sdedit_config(case_config)
720
+ num_blocks = sdedit_cfg["num_blocks"]
721
+ print(f"Case SDEdit config: {sdedit_cfg}")
722
+
723
+ # ---- Step 1: Initialize video generator ----
724
+ print(f"[1/5] Initializing video generator from {args.checkpoint_path} ...")
725
+ log_gpu("before video generator init")
726
+ generator = StreamingVideoGenerator(
727
+ checkpoint_path=args.checkpoint_path,
728
+ num_pixel_frames=sdedit_cfg["num_pixel_frames"],
729
+ denoising_steps=sdedit_cfg["denoising_step_list"],
730
+ mask_dropin_step=sdedit_cfg["mask_dropin_step"],
731
+ franka_step=sdedit_cfg["franka_step"],
732
+ use_ema=args.use_ema,
733
+ seed=args.seed,
734
+ enable_taehv=args.taehv,
735
+ )
736
+ generator.setup()
737
+ log_gpu("after video generator setup")
738
+ print(" Video generator ready.")
739
+
740
+ # ---- Step 2: Initialize simulator (Genesis scene) ----
741
+ print(f"[2/5] Loading case '{case_name}' and building Genesis scene ...")
742
+ log_gpu("before simulator init")
743
+ # Per-case config overrides (e.g. disable built-in force fields for
744
+ # cases where the demo handler applies forces interactively).
745
+ config_overrides = {}
746
+ if case_name == "santa_cloth":
747
+ config_overrides["skip_force_fields"] = True
748
+ simulator = InteractiveSimulator(
749
+ str(demo_data_path), config_overrides=config_overrides,
750
+ )
751
+ if args.no_debug:
752
+ simulator.config["debug"] = False
753
+ log_gpu("after simulator init")
754
+
755
+ # Create per-case demo handler and attach to simulator
756
+ demo_case_handler = get_demo_case_handler(case_name, simulator.config)
757
+ demo_case_handler.set_object_masks(simulator.object_masks_b64)
758
+ simulator.set_demo_case_handler(demo_case_handler)
759
+ print(f" Demo case handler: {type(demo_case_handler).__name__}")
760
+
761
+ noise_warper = StreamingNoiseWarper(crop_start=simulator.crop_start)
762
+ log_gpu("after noise warper init")
763
+ print(" Simulator and noise warper ready.")
764
+
765
+ # ---- Step 3: Pre-compute first frame encoding + KV cache + default prompt ----
766
+ print("[3/5] Pre-computing first frame encoding + KV cache + default prompt ...")
767
+ first_frame_path = _find_first_frame()
768
+ preview_pil = Image.open(first_frame_path).convert("RGB")
769
+ preview_b64 = _encode_pil_b64(preview_pil)
770
+ default_prompt = simulator.config.get("vgen_prompt", "A video of physical simulation")
771
+ generator.precompute_first_frame(first_frame_path, default_prompt=default_prompt)
772
+ log_gpu("after first frame pre-computation")
773
+ print(f" First frame pre-computed from {first_frame_path}. All components initialized.")
774
+
775
+ # ---- Step 4: Warm up CUDA kernels ----
776
+ _warmup_pipeline()
777
+
778
+ # ---- Step 5: Start server ----
779
+ print(f"\nStarting server on {args.host}:{args.port}")
780
+ print(f"Open http://localhost:{args.port} in your browser.\n")
781
+ socketio.run(app, host=args.host, port=args.port, debug=False,
782
+ allow_unsafe_werkzeug=True)
783
+
784
+
785
+ if __name__ == "__main__":
786
+ main()
case_handlers/__init__.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ """Import case handlers to trigger registration."""
2
+
3
+ from case_handlers.lamp import LampDemoHandler
4
+ from case_handlers.persimmon import PersimmonDemoHandler
5
+ from case_handlers.santa_cloth import SantaClothDemoHandler
6
+ from case_handlers.tree import TreeDemoHandler
case_handlers/base.py ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Base demo case handler with registry pattern.
2
+
3
+ Provides a registry + decorator for per-case UI configuration and
4
+ force application logic in the demo_web frontend.
5
+ """
6
+
7
+ import numpy as np
8
+
9
+ DEMO_CASE_REGISTRY = {}
10
+
11
+
12
+ def register_demo_case(case_name: str):
13
+ """Decorator to register a DemoCaseHandler subclass."""
14
+ def decorator(cls):
15
+ if case_name in DEMO_CASE_REGISTRY:
16
+ raise ValueError(f"Demo case '{case_name}' already registered!")
17
+ DEMO_CASE_REGISTRY[case_name] = cls
18
+ return cls
19
+ return decorator
20
+
21
+
22
+ class DemoCaseHandler:
23
+ """Base class for per-case UI config and force application in demo_web.
24
+
25
+ Subclasses override ``get_ui_config`` and optionally ``apply_forces``
26
+ to customise behaviour for specific cases.
27
+ """
28
+
29
+ # Per-object physics force multiplier applied on top of the UI strength
30
+ # slider. Subclasses override this so the UI always shows a normalised
31
+ # 0-5 range while the actual force magnitude is case-appropriate.
32
+ # Either a single float (applied to all objects) or a list of floats
33
+ # (one per object).
34
+ force_scale = 1.0
35
+
36
+ def __init__(self, config):
37
+ self.config = config
38
+ self._forces = [] # list of {"obj_idx", "direction", "strength"}
39
+ self._object_masks_b64 = [] # per-object mask images as base64 PNGs
40
+
41
+ @property
42
+ def num_objects(self):
43
+ return len(self.config.get("material_type", []))
44
+
45
+ def set_object_masks(self, masks_b64_list):
46
+ """Store base64-encoded mask PNGs for each object."""
47
+ self._object_masks_b64 = list(masks_b64_list) if masks_b64_list else []
48
+
49
+ # -- UI configuration --------------------------------------------------
50
+
51
+ def get_ui_config(self):
52
+ """Return JSON-serialisable dict describing per-object controls.
53
+
54
+ Default: one control per object with left/right/none, strength 1.0.
55
+ Includes mask_b64 for each object if masks were set.
56
+ """
57
+ objects = []
58
+ for idx in range(self.num_objects):
59
+ obj = {
60
+ "idx": idx,
61
+ "label": f"Object {idx}",
62
+ "directions": ["left", "none", "right"],
63
+ "default_direction": "none",
64
+ "default_strength": 1.0,
65
+ "max_strength": 2.0,
66
+ }
67
+ if idx < len(self._object_masks_b64):
68
+ obj["mask_b64"] = self._object_masks_b64[idx]
69
+ objects.append(obj)
70
+ return {"num_objects": self.num_objects, "objects": objects}
71
+
72
+ # -- Force management --------------------------------------------------
73
+
74
+ def get_force_config_from_ui(self, ui_forces):
75
+ """Map UI force dicts to 3D vectors.
76
+
77
+ Args:
78
+ ui_forces: list of ``{"obj_idx", "direction", "strength"}``
79
+ where direction is either a legacy string
80
+ ("left"/"right"/"none") or a 3-element list [dx, dy, dz].
81
+
82
+ Returns:
83
+ list of ``{"obj_idx", "direction": [dx,dy,dz], "strength"}``.
84
+ """
85
+ legacy_map = {
86
+ "left": [-1.0, 0.0, 0.0],
87
+ "right": [1.0, 0.0, 0.0],
88
+ "none": [0.0, 0.0, 0.0],
89
+ }
90
+ result = []
91
+ for f in ui_forces:
92
+ d = f.get("direction", [0.0, 0.0, 0.0])
93
+ if isinstance(d, str):
94
+ vec = legacy_map.get(d, [0.0, 0.0, 0.0])
95
+ else:
96
+ vec = [float(v) for v in d]
97
+ result.append({
98
+ "obj_idx": int(f.get("obj_idx", 0)),
99
+ "direction": vec,
100
+ "strength": float(f.get("strength", 0.0)),
101
+ })
102
+ return result
103
+
104
+ def set_forces(self, forces):
105
+ """Store resolved force configs (output of ``get_force_config_from_ui``)."""
106
+ self._forces = list(forces)
107
+
108
+ def configure_simulation(self, simulator):
109
+ """Called from the main thread before the generation loop starts.
110
+
111
+ Override in subclasses that need to set simulation state requiring
112
+ the main thread's CUDA context (e.g. taichi field writes).
113
+ """
114
+ pass
115
+
116
+ def reset_forces(self):
117
+ self._forces = []
118
+
119
+ def apply_forces(self, simulator, step_count):
120
+ """Apply stored forces to the simulator's objects.
121
+
122
+ Default behaviour: apply a constant force every step to each rigid
123
+ object that has a non-zero direction.
124
+ """
125
+ for f in self._forces:
126
+ obj_idx = f["obj_idx"]
127
+ direction = np.array(f["direction"], dtype=np.float32)
128
+ strength = f["strength"]
129
+ norm = np.linalg.norm(direction)
130
+ if norm < 1e-6:
131
+ continue
132
+ direction = direction / norm
133
+ if isinstance(self.force_scale, (list, tuple)):
134
+ scale = self.force_scale[obj_idx] if obj_idx < len(self.force_scale) else 1.0
135
+ else:
136
+ scale = self.force_scale
137
+ force_magnitude = strength * scale
138
+ mt = simulator.material_type[obj_idx] if obj_idx < len(simulator.material_type) else "rigid"
139
+ if mt == "rigid":
140
+ simulator.objs[obj_idx].solver.apply_links_external_force(
141
+ force=(direction * force_magnitude).reshape(1, 3),
142
+ links_idx=[simulator.objs[obj_idx].idx],
143
+ )
144
+
145
+
146
+ def get_demo_case_handler(case_name, config):
147
+ """Factory: return a handler for *case_name*, falling back to default."""
148
+ cls = DEMO_CASE_REGISTRY.get(case_name, DemoCaseHandler)
149
+ return cls(config)
case_handlers/lamp.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Lamp demo case handler — single rigid object, constant force."""
2
+
3
+ from case_handlers.base import DemoCaseHandler, register_demo_case
4
+
5
+
6
+ @register_demo_case("lamp")
7
+ class LampDemoHandler(DemoCaseHandler):
8
+
9
+ force_scale = 2.5
10
+
11
+ def get_ui_config(self):
12
+ objects = [
13
+ {
14
+ "idx": 0,
15
+ "label": "Lamp",
16
+ "directions": ["left", "none", "right"],
17
+ "default_direction": "none",
18
+ "default_strength": 1.0,
19
+ "max_strength": 2.0,
20
+ },
21
+ ]
22
+ for obj in objects:
23
+ if obj["idx"] < len(self._object_masks_b64):
24
+ obj["mask_b64"] = self._object_masks_b64[obj["idx"]]
25
+ return {"num_objects": len(objects), "objects": objects}
case_handlers/persimmon.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Persimmon demo case handler — 3 rigid objects, force for first 5 steps only."""
2
+
3
+ import numpy as np
4
+
5
+ from case_handlers.base import DemoCaseHandler, register_demo_case
6
+
7
+
8
+ @register_demo_case("persimmon")
9
+ class PersimmonDemoHandler(DemoCaseHandler):
10
+
11
+ # Per-object force multiplier: top persimmon is lighter so needs less
12
+ # force to move the same distance. [top, middle, bottom]
13
+ force_scale = [50.0, 200.0, 100.0]
14
+
15
+ def get_ui_config(self):
16
+ objects = [
17
+ {
18
+ "idx": 0,
19
+ "label": "Top Persimmon",
20
+ "directions": ["left", "none", "right"],
21
+ "default_direction": "none",
22
+ "default_strength": 1.0,
23
+ "max_strength": 2.0,
24
+ },
25
+ {
26
+ "idx": 1,
27
+ "label": "Middle Persimmon",
28
+ "directions": ["left", "none", "right"],
29
+ "default_direction": "none",
30
+ "default_strength": 1.0,
31
+ "max_strength": 2.0,
32
+ },
33
+ {
34
+ "idx": 2,
35
+ "label": "Bottom Persimmon",
36
+ "directions": ["left", "none", "right"],
37
+ "default_direction": "none",
38
+ "default_strength": 1.0,
39
+ "max_strength": 2.0,
40
+ },
41
+ ]
42
+ for obj in objects:
43
+ if obj["idx"] < len(self._object_masks_b64):
44
+ obj["mask_b64"] = self._object_masks_b64[obj["idx"]]
45
+ return {"num_objects": len(objects), "objects": objects}
46
+
47
+ def apply_forces(self, simulator, step_count):
48
+ """Only apply forces for the first 5 simulation steps (matching offline persimmon.py)."""
49
+ if step_count > 5:
50
+ return
51
+ super().apply_forces(simulator, step_count)
case_handlers/santa_cloth.py ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Santa cloth demo case handler — PBD cloth with controllable wind."""
2
+
3
+ import numpy as np
4
+ import torch
5
+
6
+ from case_handlers.base import DemoCaseHandler, register_demo_case
7
+
8
+
9
+ @register_demo_case("santa_cloth")
10
+ class SantaClothDemoHandler(DemoCaseHandler):
11
+
12
+ force_scale = 1.0
13
+
14
+ def __init__(self, config):
15
+ super().__init__(config)
16
+ self._wind_direction = np.zeros(3, dtype=np.float32)
17
+ self._wind_strength = 0.0
18
+ self._wind_bounds = None # (z_low, z_high)
19
+
20
+ def get_ui_config(self):
21
+ objects = [
22
+ {
23
+ "idx": 0,
24
+ "label": "Santa's Clothes",
25
+ "directions": ["left", "none", "right"],
26
+ "default_direction": "none",
27
+ "default_strength": 1.0,
28
+ "max_strength": 2.0,
29
+ },
30
+ ]
31
+ for obj in objects:
32
+ if obj["idx"] < len(self._object_masks_b64):
33
+ obj["mask_b64"] = self._object_masks_b64[obj["idx"]]
34
+ return {"num_objects": len(objects), "objects": objects}
35
+
36
+ def configure_simulation(self, simulator):
37
+ """Pre-compute wind parameters from stored forces (any thread)."""
38
+ for f in self._forces:
39
+ direction = np.array(f["direction"], dtype=np.float32)
40
+ strength = f["strength"]
41
+ norm = np.linalg.norm(direction)
42
+ if norm < 1e-6:
43
+ self._wind_direction = np.zeros(3, dtype=np.float32)
44
+ self._wind_strength = 0.0
45
+ continue
46
+ self._wind_direction = direction / norm
47
+ self._wind_strength = strength * self.force_scale
48
+
49
+ if self._wind_bounds is None and len(simulator.all_obj_info) > 0:
50
+ info = simulator.all_obj_info[0]
51
+ z_min = float(info["min"][2])
52
+ z_max = float(info["max"][2])
53
+ z_range = z_max - z_min
54
+ self._wind_bounds = (
55
+ z_min + z_range * 0.05,
56
+ z_min + z_range * 0.8,
57
+ )
58
+
59
+ def apply_forces(self, simulator, step_count):
60
+ """Apply wind to PBD cloth by modifying particle velocities."""
61
+ if self._wind_strength < 1e-6:
62
+ return
63
+ if self._wind_bounds is None:
64
+ return
65
+
66
+ wind_lowest, wind_highest = self._wind_bounds
67
+ dt = simulator.dt
68
+
69
+ for obj_idx, obj in enumerate(simulator.objs):
70
+ mt = simulator.material_type[obj_idx] if obj_idx < len(simulator.material_type) else "rigid"
71
+ if mt not in ("pbd_cloth", "pbd_elastic", "pbd_particle"):
72
+ continue
73
+
74
+ solver = obj.solver
75
+ state = solver.get_state(0)
76
+ if state is None:
77
+ continue
78
+
79
+ p_start = obj.particle_start
80
+ n_p = obj.n_particles
81
+
82
+ z = state.pos[0, p_start:p_start + n_p, 2]
83
+ is_free = state.free[0, p_start:p_start + n_p].bool()
84
+ in_zone = (z > wind_lowest) & (z < wind_highest)
85
+ mask = is_free & in_zone
86
+ if not mask.any():
87
+ continue
88
+
89
+ t = torch.zeros_like(z)
90
+ t[mask] = (z[mask] - wind_lowest) / (wind_highest - wind_lowest)
91
+ scaler = torch.zeros_like(z)
92
+ scaler[mask] = torch.exp(t[mask] ** 2)
93
+
94
+ wind_dir = torch.tensor(
95
+ self._wind_direction, dtype=z.dtype, device=z.device,
96
+ )
97
+
98
+ wind_delta = wind_dir.unsqueeze(0) * (self._wind_strength * scaler.unsqueeze(1) * dt)
99
+ state.vel[0, p_start:p_start + n_p, :] += wind_delta
100
+
101
+ solver.set_state(0, state)
case_handlers/tree.py ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tree demo case handler — MPM elastic with controllable wind."""
2
+
3
+ import numpy as np
4
+ import torch
5
+
6
+ from case_handlers.base import DemoCaseHandler, register_demo_case
7
+
8
+
9
+ @register_demo_case("tree")
10
+ class TreeDemoHandler(DemoCaseHandler):
11
+
12
+ force_scale = 1.0
13
+
14
+ def __init__(self, config):
15
+ super().__init__(config)
16
+ self._wind_direction = np.zeros(3, dtype=np.float32)
17
+ self._wind_strength = 0.0
18
+ self._wind_bounds = None # (z_low, z_high)
19
+
20
+ def get_ui_config(self):
21
+ objects = [
22
+ {
23
+ "idx": 0,
24
+ "label": "Tree",
25
+ "directions": ["left", "none", "right"],
26
+ "default_direction": "none",
27
+ "default_strength": 1.0,
28
+ "max_strength": 2.0,
29
+ },
30
+ ]
31
+ for obj in objects:
32
+ if obj["idx"] < len(self._object_masks_b64):
33
+ obj["mask_b64"] = self._object_masks_b64[obj["idx"]]
34
+ return {"num_objects": len(objects), "objects": objects}
35
+
36
+ def configure_simulation(self, simulator):
37
+ """Pre-compute wind parameters from stored forces (any thread)."""
38
+ for f in self._forces:
39
+ direction = np.array(f["direction"], dtype=np.float32)
40
+ strength = f["strength"]
41
+ norm = np.linalg.norm(direction)
42
+ if norm < 1e-6:
43
+ self._wind_direction = np.zeros(3, dtype=np.float32)
44
+ self._wind_strength = 0.0
45
+ continue
46
+ self._wind_direction = direction / norm
47
+ self._wind_strength = strength * self.force_scale
48
+
49
+ if self._wind_bounds is None and len(simulator.all_obj_info) > 0:
50
+ info = simulator.all_obj_info[0]
51
+ z_min = float(info["min"][2])
52
+ z_max = float(info["max"][2])
53
+ z_range = z_max - z_min
54
+ self._wind_bounds = (
55
+ z_min + z_range * 0.05,
56
+ z_min + z_range * 0.8,
57
+ )
58
+
59
+ def apply_forces(self, simulator, step_count):
60
+ """Apply wind to MPM particles by modifying particle velocities."""
61
+ if self._wind_strength < 1e-6:
62
+ return
63
+ if self._wind_bounds is None:
64
+ return
65
+
66
+ wind_lowest, wind_highest = self._wind_bounds
67
+ dt = simulator.dt
68
+
69
+ for obj_idx, obj in enumerate(simulator.objs):
70
+ mt = simulator.material_type[obj_idx] if obj_idx < len(simulator.material_type) else "rigid"
71
+ if not mt.startswith("mpm_"):
72
+ continue
73
+
74
+ solver = obj.solver
75
+ state = solver.get_state(0)
76
+ if state is None:
77
+ continue
78
+
79
+ p_start = obj.particle_start
80
+ n_p = obj.n_particles
81
+
82
+ z = state.pos[0, p_start:p_start + n_p, 2]
83
+
84
+ in_zone = (z > wind_lowest) & (z < wind_highest)
85
+ if hasattr(state, 'free'):
86
+ mask = state.free[0, p_start:p_start + n_p].bool() & in_zone
87
+ else:
88
+ mask = in_zone
89
+ if not mask.any():
90
+ continue
91
+
92
+ t = torch.zeros_like(z)
93
+ t[mask] = (z[mask] - wind_lowest) / (wind_highest - wind_lowest)
94
+ scaler = torch.zeros_like(z)
95
+ scaler[mask] = torch.exp(t[mask] ** 2)
96
+
97
+ wind_dir = torch.tensor(
98
+ self._wind_direction, dtype=z.dtype, device=z.device,
99
+ )
100
+
101
+ wind_delta = wind_dir.unsqueeze(0) * (self._wind_strength * scaler.unsqueeze(1) * dt)
102
+ state.vel[0, p_start:p_start + n_p, :] += wind_delta
103
+
104
+ solver.set_state(0, state)
config.py ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Default configuration constants for the RealWonder interactive demo."""
2
+
3
+ # Video dimensions
4
+ DEFAULT_HEIGHT = 480
5
+ DEFAULT_WIDTH = 832
6
+
7
+ # Latent dimensions (after VAE encoding)
8
+ LATENT_H = 60
9
+ LATENT_W = 104
10
+ LATENT_C = 16
11
+
12
+ # VAE temporal downsampling factor
13
+ TEMPORAL_FACTOR = 4
14
+
15
+ # Causal generation blocks (model architecture constants)
16
+ FRAMES_PER_BLOCK = 3 # latent frames per block
17
+ FRAMES_PER_BLOCK_PIXEL = FRAMES_PER_BLOCK * TEMPORAL_FACTOR # pixel frames per block
18
+ FRAMES_FIRST_BLOCK_PIXEL = (FRAMES_PER_BLOCK - 1) * TEMPORAL_FACTOR + 1 # pixel frames for first block
19
+
20
+ # Playback
21
+ FPS = 8
22
+
23
+ # Simulation parameters are read from each case's config.yaml at runtime
24
+ # (dt, substeps, frame_steps) — see InteractiveSimulator.__init__
25
+
26
+ # Noise warping
27
+ NOISE_CHANNELS = 32
28
+
29
+ # SDEdit
30
+ EVAL_DEGRADATION = 0.5
31
+
32
+ # Model defaults
33
+ DEFAULT_LOCAL_ATTN_SIZE = 21
34
+ DEFAULT_TIMESTEP_SHIFT = 5.0
35
+ CONTEXT_NOISE = 0
36
+
37
+
38
+ def load_case_sdedit_config(case_config: dict) -> dict:
39
+ """Extract SDEdit parameters from a case config.yaml dict.
40
+
41
+ Reads num_output_frames, denoising_step_list, mask_dropin_step from the
42
+ case config and computes all derived frame/block counts.
43
+
44
+ Returns a dict with keys:
45
+ num_latent_frames, num_pixel_frames, num_blocks,
46
+ denoising_step_list, mask_dropin_step
47
+ """
48
+ num_latent_frames = case_config["num_output_frames"]
49
+ assert num_latent_frames % FRAMES_PER_BLOCK == 0, (
50
+ f"num_output_frames ({num_latent_frames}) must be divisible by "
51
+ f"FRAMES_PER_BLOCK ({FRAMES_PER_BLOCK})"
52
+ )
53
+ return {
54
+ "num_latent_frames": num_latent_frames,
55
+ "num_pixel_frames": (num_latent_frames - 1) * TEMPORAL_FACTOR + 1,
56
+ "num_blocks": num_latent_frames // FRAMES_PER_BLOCK,
57
+ "denoising_step_list": case_config["denoising_step_list"],
58
+ "mask_dropin_step": case_config.get("mask_dropin_step", -1),
59
+ "franka_step": case_config.get("franka_step", -1),
60
+ }
demo_data/.gitkeep ADDED
File without changes
demo_data/lamp/bg_points.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c1b8fd606ca468ed6f9f0a8eebc871949df4f50355cb198242d1548a5c0b245
3
+ size 6292900
demo_data/lamp/camera.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:09017373389bc34d069d66ee6388670e04cd8f7e9c30b8a43e5adff02d062654
3
+ size 1928
demo_data/lamp/config.yaml ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ device: cuda
2
+ seed: 0
3
+ example_name: lamp
4
+ output_folder: demo_web/demo_data/lamp/recon_tmp
5
+ data_path: cases/lamp
6
+ segmenter: sam2
7
+ all_object_points:
8
+ - - - 250
9
+ - 207
10
+ - 1
11
+ - - 273
12
+ - 287
13
+ - 1
14
+ all_object_masks_idx:
15
+ - 1
16
+ obj_kp_matching: true
17
+ obj_kp:
18
+ - - - 0.2
19
+ - 0.8
20
+ - - 0.1
21
+ - 0.9
22
+ logging_level: details
23
+ debug: true
24
+ stitched_inpainting: false
25
+ mesh_resize_factor: 1.0
26
+ target_faces: 10000
27
+ dt: 0.02
28
+ substeps: 10
29
+ simulated_frames_num: 81
30
+ frame_steps: 1
31
+ material_type:
32
+ - rigid
33
+ use_primitive: true
34
+ remap_depth:
35
+ - 1.0
36
+ - 2.0
37
+ rigid_rho: 1000
38
+ rigid_friction: 0.01
39
+ plane_friction: 0.01
40
+ gravity: -1
41
+ alpha_threshold: 0.8
42
+ crop_start: 200
43
+ fg_points_render_radius: 0.01
44
+ num_output_frames: 21
45
+ denoising_step_list:
46
+ - 800
47
+ - 500
48
+ - 250
49
+ mask_dropin_step: -1
50
+ vgen_prompt: A square paper lantern is moving on river. Water surface ripples follow
51
+ the motion. Twilight, cinematic realism.
52
+ fov_x_input: 27.449039459228516
demo_data/lamp/fg_masks/mask_00.png ADDED

Git LFS Details

  • SHA256: ee6e42eeb59719d0673a7ee7107c311efa8769caf5057ba390c990d06f6d9502
  • Pointer size: 129 Bytes
  • Size of remote file: 3.91 kB
demo_data/lamp/fg_meshes/mesh_00.obj ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7ad356d8cc30f1c3acd9e0313fb9aa518c96ce7f3c6b1d810a285286d4a4395
3
+ size 408054
demo_data/lamp/fg_pcs/pc_00.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3f2781d5463f9561cbf447f2b85c40294096a6f2e5364481bd7520f851ff136
3
+ size 466828
demo_data/lamp/first_frame.png ADDED

Git LFS Details

  • SHA256: 5dd4cc9874c185797d87d8c6753d4fb10d38ac1083826a8e0ce7a3ecef53b0d6
  • Pointer size: 131 Bytes
  • Size of remote file: 404 kB
demo_data/lamp/inpainted_bg.png ADDED

Git LFS Details

  • SHA256: fd7b3a3b94efcde5663b269ed5b47942d86989302bbfbdba1c98954a23fb6b12
  • Pointer size: 131 Bytes
  • Size of remote file: 283 kB
demo_data/lamp/sim_tmp/fg_mesh_00.obj ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:84383d9af1646c4535ca79b6571b04bdd52d1a097d1c028a431ba2e14e485866
3
+ size 410135
demo_data/lamp/sim_tmp/flow_image.gif ADDED

Git LFS Details

  • SHA256: 01016d0b335c174519a10717c4efc74b4606ff1e60a1f03136b39867349092b0
  • Pointer size: 131 Bytes
  • Size of remote file: 772 kB
demo_data/lamp/sim_tmp/frames/frame_0001.png ADDED

Git LFS Details

  • SHA256: 5f9606e332e45dde609f768c8143b85850f1b407ee346a44fe97f207c41eb8f3
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0002.png ADDED

Git LFS Details

  • SHA256: dc10ddb43301ce45ea8be89ae25803f623e5638a5d77e429c38d557a63e6d147
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0003.png ADDED

Git LFS Details

  • SHA256: 1c0db314612d1e28d4288fd3bdbca7203f6c7a9ca9d9c54906b9b8dfe08a38cf
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0004.png ADDED

Git LFS Details

  • SHA256: 26e6eed29d46e667a98fdc58b6ba7cb63a61461ddc677896868c9c1128762f07
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0005.png ADDED

Git LFS Details

  • SHA256: 205e1e023248d59d6fc55d203afe4cb9e6940e4f5d0dbe92be3d54e843a2189b
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0006.png ADDED

Git LFS Details

  • SHA256: fa047c75a4fe316d4b458ca724883f1eb444e4b329126898482b3e7eb06f069d
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0007.png ADDED

Git LFS Details

  • SHA256: ddb677f773c5f7d34034e81c0dae224a4bbb77f2b4fc59e2ab02da9949abb09f
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0008.png ADDED

Git LFS Details

  • SHA256: 5fc51558f7065e6a5d207ca23b75ce5727ffbf198fec03192ebca099308a5e7c
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0009.png ADDED

Git LFS Details

  • SHA256: 136385fdd3f643026ece63d04810c641bc78457158d10363df20a8c681a898fc
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0010.png ADDED

Git LFS Details

  • SHA256: e67effacbf4287f54ca538739a38d8a19a648137f6f6fd0c5a5555c65b0860aa
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0011.png ADDED

Git LFS Details

  • SHA256: 6793f1e83869b9245642c006a2fdf20126347f98ee438ec7f2881c59125af119
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0012.png ADDED

Git LFS Details

  • SHA256: ff54dcce198150363a44352dbe06ec19147607a34c9eaeaf59e83efc74857105
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0013.png ADDED

Git LFS Details

  • SHA256: e6001ea3af4b606ed4f3d559c6e4be7cf6227e46bf048fd328364bd7f2b078fb
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0014.png ADDED

Git LFS Details

  • SHA256: d9e3399516a64626dec0034f65f101225d2eda21e48da9fc25d62f4923f2422b
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0015.png ADDED

Git LFS Details

  • SHA256: c13d8a7e7d738e8b42df75a8c8ed1ae51365990d4aa83e6f941c39cf680a896b
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0016.png ADDED

Git LFS Details

  • SHA256: bed979c89a6c2d1a0bf148e78f661ab91f26166152247ce0834da28863b45435
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0017.png ADDED

Git LFS Details

  • SHA256: 8da29178cd0792693119ccce21286f6c5dce31db39aaec2149c36273b30fb6ae
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0018.png ADDED

Git LFS Details

  • SHA256: f89644b7aa2f4ff9ff8f46bef17fabf6b3939dd6ebb5944daf05889ace920a8f
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0019.png ADDED

Git LFS Details

  • SHA256: 7d348c036c52b67b8c63d3e8bc4a8f80a9b17f0a7b78fede54cd916df81ab28f
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0020.png ADDED

Git LFS Details

  • SHA256: c1547ec52a94464d5e7adb076bbd49e414a9e1d11dce8675f6ed48815a980afc
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0021.png ADDED

Git LFS Details

  • SHA256: e942361050f80c63c59dfacb8a8355684d89f4ead1702291a4deb4be26840722
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0022.png ADDED

Git LFS Details

  • SHA256: 547ff08ae4e533be746f5eae246d430733a3086a7352234f985183c076e6006c
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0023.png ADDED

Git LFS Details

  • SHA256: 1144d1e6c1fedea41ad78f265eae34854e8fcdc5ee3c5aadb297d37c23a3d6e4
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0024.png ADDED

Git LFS Details

  • SHA256: 9c3f02f94d6ecf55eb2fb8c1c1400aa29dbdb1aa5674f4882b431f60f816f946
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0025.png ADDED

Git LFS Details

  • SHA256: 893b159cffbcabc912a6a13d9564223ae54040e7c1df3b0bb23fee6732a1189a
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0026.png ADDED

Git LFS Details

  • SHA256: 07673a49924079ad00d24652b5bdfcf7e3ba849e54cfc1552dcaf3d9c9b28d92
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0027.png ADDED

Git LFS Details

  • SHA256: 23095502f1c5ef8c195ce9dc2da853cc4ce973047fcea3ba2cf1700b3a41f308
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0028.png ADDED

Git LFS Details

  • SHA256: 651d6d61fabd6966c4aacd13ff3b83ba6136cd27d99f2b44c4182ca3b6fc7a3c
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB
demo_data/lamp/sim_tmp/frames/frame_0029.png ADDED

Git LFS Details

  • SHA256: bd8c9fc351918f2fe8c313bc3fe058a37333817d38280817d73d0be3cdc42cd0
  • Pointer size: 131 Bytes
  • Size of remote file: 300 kB