File size: 20,076 Bytes
4e9c82a
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
af597e9
4e9c82a
 
 
af597e9
4e9c82a
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
af597e9
4e9c82a
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
af597e9
4e9c82a
 
 
 
 
af597e9
4e9c82a
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
af597e9
4e9c82a
 
 
 
 
 
af597e9
4e9c82a
 
 
 
 
 
af597e9
4e9c82a
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
 
 
 
af597e9
4e9c82a
 
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
af597e9
4e9c82a
af597e9
4e9c82a
 
 
 
 
 
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
af597e9
4e9c82a
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
# HunyuanWorld 2.0 β€” Documentation
This document provides detailed usage guides, parameter references, and output format specifications for each component of HunyuanWorld 2.0.

## Table of Contents
- [WorldMirror 2.0 (World Reconstruction)](#worldmirror-20-world-reconstruction)
  - [Overview](#overview)
  - [Python API](#python-api)
    - [`WorldMirrorPipeline.from_pretrained`](#worldmirrorpipelinefrom_pretrained)
    - [`WorldMirrorPipeline.__call__`](#worldmirrorpipelinecall)
  - [CLI Reference](#cli-reference)
  - [Output Format](#output-format)
    - [File Structure](#file-structure)
    - [Prediction Dictionary](#prediction-dictionary)
  - [Prior Injection](#prior-injection)
    - [Camera Parameters (JSON)](#camera-parameters-json)
    - [Depth Maps (Folder)](#depth-maps-folder)
    - [Combining Priors](#combining-priors)
  - [Multi-GPU Inference](#multi-gpu-inference)
  - [Advanced Options](#advanced-options)
    - [Disabling Prediction Heads](#disabling-prediction-heads)
    - [Mask Filtering](#mask-filtering)
    - [Point Cloud Compression](#point-cloud-compression)
  - [Gradio App](#gradio-app)
- [Panorama Generation](#panorama-generation)
- [World Generation](#world-generation)

---
## WorldMirror 2.0 (World Reconstruction)
### Overview
WorldMirror 2.0 is a unified feed-forward model for comprehensive 3D geometric prediction from multi-view images or video. It simultaneously generates:
- **3D point clouds** in world coordinates
- **Per-view depth maps** in camera frame
- **Surface normals** in camera coordinates
- **Camera poses** (c2w) and **intrinsics**
- **3D Gaussian Splatting** attributes (means, scales, rotations, opacities, SH coefficients)

Key improvements over WorldMirror 1.0:
- **Normalized RoPE** for flexible resolution inference
- **Depth mask prediction** for robust invalid pixel handling
- **Sequence Parallel + FSDP + BF16** for efficient multi-GPU inference

---
### Python API
#### `WorldMirrorPipeline.from_pretrained`
Factory method to load the model and create a pipeline instance.

```python
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline

pipeline = WorldMirrorPipeline.from_pretrained(
    pretrained_model_name_or_path="tencent/HY-World-2.0",
    subfolder="HY-WorldMirror-2.0",
    config_path=None,
    ckpt_path=None,
    use_fsdp=False,
    enable_bf16=False,
    fsdp_cpu_offload=False,
    disable_heads=None,
)
```

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `pretrained_model_name_or_path` | `str` | `"tencent/HY-World-2.0"` | HuggingFace repo ID or local path |
| `subfolder` | `str` | `"HY-WorldMirror-2.0"` | Subfolder inside the repo containing WorldMirror checkpoint (`model.safetensors` + config) |
| `config_path` | `str` | `None` | Training config YAML (used with `ckpt_path` for custom checkpoints) |
| `ckpt_path` | `str` | `None` | Checkpoint file (`.ckpt` / `.safetensors`). When provided with `config_path`, loads model from local checkpoint instead of HuggingFace |
| `use_fsdp` | `bool` | `False` | Shard parameters across GPUs via Fully Sharded Data Parallel |
| `enable_bf16` | `bool` | `False` | Use bfloat16 precision (except numerically critical layers) |
| `fsdp_cpu_offload` | `bool` | `False` | Offload FSDP parameters to CPU (saves GPU memory at the cost of speed) |
| `disable_heads` | `list[str]` | `None` | Heads to disable and free from memory. Options: `"camera"`, `"depth"`, `"normal"`, `"points"`, `"gs"` |

**Notes:**
- Distributed mode is auto-detected from `WORLD_SIZE` environment variable (set by `torchrun`).
- When using multi-GPU, each rank must call `from_pretrained` β€” the method handles `dist.init_process_group` internally.

---
#### `WorldMirrorPipeline.__call__`
Run inference on a set of images or a video.

```python
result = pipeline(
    input_path,
    output_path="inference_output",
    **kwargs,
)
```

Returns the output directory path (`str`), or `None` if the input was skipped.

**Inference Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `input_path` | `str` | *(required)* | Directory of images or path to a video file |
| `output_path` | `str` | `"inference_output"` | Root output directory |
| `target_size` | `int` | `952` | Maximum inference resolution (longest edge). Images are resized + center-cropped to the nearest multiple of 14 |
| `fps` | `int` | `1` | FPS for extracting frames from video input |
| `video_strategy` | `str` | `"new"` | Video frame extraction strategy: `"new"` (motion-aware) or `"old"` (uniform FPS) |
| `video_min_frames` | `int` | `1` | Minimum number of frames to extract from video |
| `video_max_frames` | `int` | `32` | Maximum number of frames to extract from video |

**Save Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `save_depth` | `bool` | `True` | Save per-view depth maps (PNG visualization + NPY raw values) |
| `save_normal` | `bool` | `True` | Save per-view surface normal maps (PNG) |
| `save_gs` | `bool` | `True` | Save 3D Gaussian Splatting as `gaussians.ply` |
| `save_camera` | `bool` | `True` | Save camera parameters as `camera_params.json` |
| `save_points` | `bool` | `True` | Save depth-derived point cloud as `points.ply` |
| `save_colmap` | `bool` | `False` | Save COLMAP-format sparse reconstruction (`sparse/0/`) |
| `save_conf` | `bool` | `False` | Save depth confidence maps |
| `save_sky_mask` | `bool` | `False` | Save sky segmentation masks |

**Mask Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `apply_sky_mask` | `bool` | `True` | Filter out sky regions from point clouds and Gaussians |
| `apply_edge_mask` | `bool` | `True` | Filter out edge/discontinuity regions |
| `apply_confidence_mask` | `bool` | `False` | Filter out low-confidence predictions |
| `sky_mask_source` | `str` | `"auto"` | Sky mask method: `"auto"` (ONNX + model fusion), `"model"` (model predictions only), `"onnx"` (external segmentation only) |
| `model_sky_threshold` | `float` | `0.45` | Threshold for model-based sky detection |
| `confidence_percentile` | `float` | `10.0` | Percentile threshold for confidence filtering (bottom N% removed) |
| `edge_normal_threshold` | `float` | `1.0` | Normal edge detection tolerance |
| `edge_depth_threshold` | `float` | `0.03` | Depth edge detection relative tolerance |

**Compression Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `compress_pts` | `bool` | `True` | Compress point clouds via voxel merging + random sampling |
| `compress_pts_max_points` | `int` | `2,000,000` | Maximum number of points after compression |
| `compress_pts_voxel_size` | `float` | `0.002` | Voxel size for point cloud merging |
| `max_resolution` | `int` | `1920` | Maximum resolution for saved output images |
| `compress_gs_max_points` | `int` | `5,000,000` | Maximum number of Gaussians after voxel pruning |

**Prior Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prior_cam_path` | `str` | `None` | Path to camera parameters JSON file |
| `prior_depth_path` | `str` | `None` | Path to directory containing depth map files |

**Rendered Video Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `save_rendered` | `bool` | `False` | Render interpolated fly-through video from Gaussian splats |
| `render_interp_per_pair` | `int` | `15` | Number of interpolated frames between each camera pair |
| `render_depth` | `bool` | `False` | Also render a depth visualization video |

**Misc Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `log_time` | `bool` | `True` | Print timing report and save `pipeline_timing.json` |
| `strict_output_path` | `str` | `None` | If set, save results directly to this path without `<case_name>/<timestamp>` subdirectories |

---
### CLI Reference
All `__call__` parameters are exposed as CLI arguments:

```bash
python -m hyworld2.worldrecon.pipeline \
    --input_path path/to/images \
    --output_path inference_output \
    --target_size 952 \
    --prior_cam_path path/to/camera_params.json \
    --prior_depth_path path/to/depth_dir/ \
```

**Boolean flag conventions:**

| Enable | Disable |
|--------|---------|
| `--save_colmap` | *(omit)* |
| `--save_conf` | *(omit)* |
| `--save_sky_mask` | *(omit)* |
| `--apply_sky_mask` (default on) | `--no_sky_mask` |
| `--apply_edge_mask` (default on) | `--no_edge_mask` |
| `--apply_confidence_mask` | *(omit)* |
| `--compress_pts` (default on) | `--no_compress_pts` |
| `--log_time` (default on) | `--no_log_time` |
| *(default on)* `save_depth` | `--no_save_depth` |
| *(default on)* `save_normal` | `--no_save_normal` |
| *(default on)* `save_gs` | `--no_save_gs` |
| *(default on)* `save_camera` | `--no_save_camera` |
| *(default on)* `save_points` | `--no_save_points` |
| `--save_rendered` | *(omit)* |
| `--render_depth` | *(omit)* |

**Additional CLI-only arguments:**

| Argument | Description |
|----------|-------------|
| `--config_path` | Training config YAML for custom checkpoint loading |
| `--ckpt_path` | Local checkpoint file path |
| `--use_fsdp` | Enable FSDP multi-GPU sharding |
| `--enable_bf16` | Enable bfloat16 mixed precision |
| `--fsdp_cpu_offload` | Offload FSDP params to CPU |
| `--disable_heads` | Space-separated list of heads to disable (e.g. `--disable_heads camera normal`) |
| `--no_interactive` | Exit after first inference (skip interactive prompt loop) |

---
### Output Format
#### File Structure

```
inference_output/
└── <case_name>/
    └── <timestamp>/
        β”œβ”€β”€ depth/
        β”‚   β”œβ”€β”€ depth_0000.png      # Normalized depth visualization
        β”‚   β”œβ”€β”€ depth_0000.npy      # Raw float32 depth values [H, W]
        β”‚   └── ...
        β”œβ”€β”€ normal/
        β”‚   β”œβ”€β”€ normal_0000.png     # Normal map visualization (RGB)
        β”‚   └── ...
        β”œβ”€β”€ camera_params.json      # Camera extrinsics & intrinsics
        β”œβ”€β”€ gaussians.ply           # 3D Gaussian Splatting (standard format)
        β”œβ”€β”€ points.ply              # Colored point cloud
        β”œβ”€β”€ sparse/                 # COLMAP format (if --save_colmap)
        β”‚   └── 0/
        β”‚       β”œβ”€β”€ cameras.bin
        β”‚       β”œβ”€β”€ images.bin
        β”‚       └── points3D.bin
        β”œβ”€β”€ rendered/               # Rendered video (if --save_rendered)
        β”‚   β”œβ”€β”€ rendered_rgb.mp4
        β”‚   └── rendered_depth.mp4  # (if --render_depth)
        └── pipeline_timing.json    # Performance timing report
```

#### Prediction Dictionary
When using the Python API, `pipeline(...)` internally produces a `predictions` dictionary with the following keys:

```python
# Geometry
predictions["depth"]        # [B, S, H, W, 1]  β€” Z-depth in camera frame
predictions["depth_conf"]   # [B, S, H, W]     β€” Depth confidence
predictions["normals"]      # [B, S, H, W, 3]  β€” Surface normals in camera coords
predictions["normals_conf"] # [B, S, H, W]     β€” Normal confidence
predictions["pts3d"]        # [B, S, H, W, 3]  β€” 3D point maps in world coords
predictions["pts3d_conf"]   # [B, S, H, W]     β€” Point cloud confidence
# Camera
predictions["camera_poses"] # [B, S, 4, 4]     β€” Camera-to-world (c2w), OpenCV convention
predictions["camera_intrs"] # [B, S, 3, 3]     β€” Camera intrinsic matrices
predictions["camera_params"]# [B, S, 9]        β€” Compact camera vector (translation, quaternion, fov_v, fov_u)
# 3D Gaussian Splatting
predictions["splats"]["means"]      # [B, N, 3] β€” Gaussian centers
predictions["splats"]["scales"]     # [B, N, 3] β€” Gaussian scales
predictions["splats"]["quats"]      # [B, N, 4] β€” Gaussian rotations (quaternions)
predictions["splats"]["opacities"]  # [B, N]    β€” Gaussian opacities
predictions["splats"]["sh"]         # [B, N, 1, 3] β€” Spherical harmonics (degree 0)
predictions["splats"]["weights"]    # [B, N]    β€” Per-Gaussian confidence weights
```

Where `B` = batch size (always 1 for inference), `S` = number of input views, `H, W` = image dimensions, `N` = total Gaussians (`S Γ— H Γ— W`).

---
### Prior Injection
WorldMirror 2.0 accepts three types of geometric priors as conditioning inputs. Priors are automatically detected from the provided files.

| Prior Type | Condition | Input Format |
|------------|-----------|--------------|
| Camera Pose | `cond_flags[0]` | c2w 4Γ—4 matrix (OpenCV convention) |
| Depth Map | `cond_flags[1]` | Per-view float depth maps |
| Intrinsics | `cond_flags[2]` | 3Γ—3 intrinsic matrix |

#### Camera Parameters (JSON)
The camera parameter file follows the same format as the `camera_params.json` output by the pipeline:

```json
{
  "num_cameras": 2,
  "extrinsics": [
    {
      "camera_id": 0,
      "matrix": [
        [0.98, 0.01, -0.17, 0.52],
        [-0.01, 0.99, 0.01, -0.03],
        [0.17, -0.01, 0.98, 1.20],
        [0.0, 0.0, 0.0, 1.0]
      ]
    }
  ],
  "intrinsics": [
    {
      "camera_id": 0,
      "matrix": [
        [525.0, 0.0, 320.0],
        [0.0, 525.0, 240.0],
        [0.0, 0.0, 1.0]
      ]
    }
  ]
}
```

**Field descriptions:**

| Field | Description |
|-------|-------------|
| `camera_id` | Integer index (`0`, `1`, `2`, ...) or image filename stem without extension (e.g., `"image_0001"`) |
| `extrinsics.matrix` | 4Γ—4 camera-to-world (c2w) transformation matrix, OpenCV coordinate convention |
| `intrinsics.matrix` | 3Γ—3 camera intrinsic matrix in pixels (`fx, fy` = focal lengths; `cx, cy` = principal point) |

**Important notes:**
- `extrinsics` and `intrinsics` lists can be provided independently or together. An empty list `[]` or missing key means that prior is unavailable.
- **Intrinsics resolution:** Values should correspond to the **original image resolution**. The pipeline automatically adjusts for inference-time resize + center-crop.
- **Extrinsics alignment:** The pipeline automatically normalizes all extrinsics relative to the first view, consistent with training behavior.
#### Depth Maps (Folder)
Depth maps are stored as individual files in a directory. Filenames should match the input image filenames. Supported formats: `.npy`, `.exr`, `.png` (16-bit).

```
prior_depth/
β”œβ”€β”€ image_0001.npy    # float32, shape [H, W]
β”œβ”€β”€ image_0002.npy
└── ...
```

#### Combining Priors
Priors can be freely combined. Examples:

```bash
# Only intrinsics
python -m hyworld2.worldrecon.pipeline --input_path images/ \
    --prior_cam_path camera_intrinsics_only.json
# Only depth
python -m hyworld2.worldrecon.pipeline --input_path images/ \
    --prior_depth_path depth_maps/
# Camera poses + intrinsics + depth
python -m hyworld2.worldrecon.pipeline --input_path images/ \
    --prior_cam_path camera_params.json \
    --prior_depth_path depth_maps/
```

---
### Multi-GPU Inference
WorldMirror 2.0 supports **Sequence Parallel (SP)** inference across multiple GPUs, where token sequences are sharded across ranks in the ViT backbone, and DPT heads process frames in parallel.

> **Requirement:** The number of input images must be **>= the number of GPUs** (`nproc_per_node`). For example, with 8 GPUs you need at least 8 input images. The pipeline will raise an error if this condition is not met.

```bash
# 2 GPUs with FSDP + bf16
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.pipeline \
    --input_path path/to/images \
    --use_fsdp --enable_bf16
# 4 GPUs
torchrun --nproc_per_node=4 -m hyworld2.worldrecon.pipeline \
    --input_path path/to/images \
    --use_fsdp --enable_bf16
# Python API (inside a torchrun script)
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
pipeline = WorldMirrorPipeline.from_pretrained(
    'tencent/HY-World-2.0',
    use_fsdp=True,
    enable_bf16=True,
)
pipeline('path/to/images')
```

**What happens under the hood:**
1. `from_pretrained` auto-detects `WORLD_SIZE > 1` and initializes `torch.distributed`.
2. The model is loaded on rank 0 and broadcast via `sync_module_states=True`.
3. FSDP shards parameters across the SP process group.
4. DPT prediction heads split frames across ranks and `AllGather` results.
5. Post-processing (mask computation, saving) runs on rank 0 only.

---
### Advanced Options
#### Disabling Prediction Heads
To save memory when you only need specific outputs:

```python
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline

pipeline = WorldMirrorPipeline.from_pretrained(
    'tencent/HY-World-2.0',
    disable_heads=["normal", "points"],  # free ~200M params
)
```

Available heads: `"camera"`, `"depth"`, `"normal"`, `"points"`, `"gs"`.
#### Mask Filtering
The pipeline supports three types of output filtering to improve point cloud and Gaussian quality:
1. **Sky mask** (`apply_sky_mask=True`): Removes sky regions using an ONNX-based segmentation model, optionally fused with model-predicted depth masks.
2. **Edge mask** (`apply_edge_mask=True`): Removes points at depth/normal discontinuities (object boundaries).
3. **Confidence mask** (`apply_confidence_mask=False`): Removes the bottom N% of points by prediction confidence.
These masks are applied independently to both the `points.ply` (depth-based) and `gaussians.ply` (GS-based) outputs. The GS output uses its own depth predictions for edge detection when available.
#### Point Cloud Compression
When `compress_pts=True` (default), the depth-derived point cloud undergoes:
1. **Voxel merging**: Points within each voxel (size controlled by `compress_pts_voxel_size`) are merged via weighted averaging.
2. **Random subsampling**: If the result exceeds `compress_pts_max_points`, points are uniformly subsampled.
Similarly, Gaussians are voxel-pruned (weighted averaging of means, scales, quaternions, colors, opacities) and optionally subsampled to `compress_gs_max_points`.

---
### Gradio App
An interactive web demo for WorldMirror 2.0. Upload images or videos and visualize 3DGS, point clouds, depth maps, normal maps, and camera parameters in your browser.
**Quick start:**

```bash
# Single GPU
python -m hyworld2.worldrecon.gradio_app

# Multi-GPU
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
    --use_fsdp --enable_bf16
```

**With a local checkpoint:**

```bash
python -m hyworld2.worldrecon.gradio_app \
    --config_path /path/to/config.yaml \
    --ckpt_path /path/to/checkpoint.safetensors
```

**With a public link (e.g., for Colab or remote servers):**

```bash
python -m hyworld2.worldrecon.gradio_app --share
```

**Arguments:**

| Argument | Default | Description |
|----------|---------|-------------|
| `--port` | `8081` | Server port |
| `--host` | `0.0.0.0` | Server host |
| `--share` | `False` | Create a public Gradio link |
| `--examples_dir` | `./examples/worldrecon` | Path to example scenes directory |
| `--config_path` | `None` | Training config YAML (used with `--ckpt_path`) |
| `--ckpt_path` | `None` | Local checkpoint file (`.ckpt` / `.safetensors`) |
| `--use_fsdp` | `False` | Enable FSDP multi-GPU sharding |
| `--enable_bf16` | `False` | Enable bfloat16 mixed precision |
| `--fsdp_cpu_offload` | `False` | Offload FSDP params to CPU (saves GPU memory) |

> **Important:** In multi-GPU mode, the number of input images must be **>= the number of GPUs**.

---
## Panorama Generation
*Coming soon.*
This section will document the panorama generation model, including:
- Text-to-panorama and image-to-panorama APIs
- Model architecture (MMDiT-based implicit perspective-to-ERP mapping)
- Configuration parameters
- Output formats

---
## World Generation
*Coming soon.*
This section will document the world generation pipeline, including:
- Trajectory planning configuration
- World expansion with memory-driven video generation
- World composition (point cloud expansion + 3DGS optimization)
- End-to-end generation from text/image to navigable 3D world