tencent
/

HY-World-2.0

@@ -1,6 +1,6 @@
 # HunyuanWorld 2.0 — Documentation
 This document provides detailed usage guides, parameter references, and output format specifications for each component of HunyuanWorld 2.0.
----
 ## Table of Contents
 - [WorldMirror 2.0 (World Reconstruction)](#worldmirror-20-world-reconstruction)
   - [Overview](#overview)
@@ -23,6 +23,7 @@ This document provides detailed usage guides, parameter references, and output f
   - [Gradio App](#gradio-app)
 - [Panorama Generation](#panorama-generation)
 - [World Generation](#world-generation)
 ---
 ## WorldMirror 2.0 (World Reconstruction)
 ### Overview
@@ -37,10 +38,12 @@ Key improvements over WorldMirror 1.0:
 - **Normalized RoPE** for flexible resolution inference
 - **Depth mask prediction** for robust invalid pixel handling
 - **Sequence Parallel + FSDP + BF16** for efficient multi-GPU inference
 ---
 ### Python API
 #### `WorldMirrorPipeline.from_pretrained`
 Factory method to load the model and create a pipeline instance.
 ```python
 from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
@@ -55,6 +58,7 @@ pipeline = WorldMirrorPipeline.from_pretrained(
     disable_heads=None,
 )
 ```
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `pretrained_model_name_or_path` | `str` | `"tencent/HY-World-2.0"` | HuggingFace repo ID or local path |
@@ -65,12 +69,15 @@ pipeline = WorldMirrorPipeline.from_pretrained(
 | `enable_bf16` | `bool` | `False` | Use bfloat16 precision (except numerically critical layers) |
 | `fsdp_cpu_offload` | `bool` | `False` | Offload FSDP parameters to CPU (saves GPU memory at the cost of speed) |
 | `disable_heads` | `list[str]` | `None` | Heads to disable and free from memory. Options: `"camera"`, `"depth"`, `"normal"`, `"points"`, `"gs"` |
 **Notes:**
 - Distributed mode is auto-detected from `WORLD_SIZE` environment variable (set by `torchrun`).
 - When using multi-GPU, each rank must call `from_pretrained` — the method handles `dist.init_process_group` internally.
 ---
 #### `WorldMirrorPipeline.__call__`
 Run inference on a set of images or a video.
 ```python
 result = pipeline(
     input_path,
@@ -78,8 +85,11 @@ result = pipeline(
     **kwargs,
 )
 ```
 Returns the output directory path (`str`), or `None` if the input was skipped.
 **Inference Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `input_path` | `str` | *(required)* | Directory of images or path to a video file |
@@ -89,7 +99,9 @@ Returns the output directory path (`str`), or `None` if the input was skipped.
 | `video_strategy` | `str` | `"new"` | Video frame extraction strategy: `"new"` (motion-aware) or `"old"` (uniform FPS) |
 | `video_min_frames` | `int` | `1` | Minimum number of frames to extract from video |
 | `video_max_frames` | `int` | `32` | Maximum number of frames to extract from video |
 **Save Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `save_depth` | `bool` | `True` | Save per-view depth maps (PNG visualization + NPY raw values) |
@@ -100,7 +112,9 @@ Returns the output directory path (`str`), or `None` if the input was skipped.
 | `save_colmap` | `bool` | `False` | Save COLMAP-format sparse reconstruction (`sparse/0/`) |
 | `save_conf` | `bool` | `False` | Save depth confidence maps |
 | `save_sky_mask` | `bool` | `False` | Save sky segmentation masks |
 **Mask Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `apply_sky_mask` | `bool` | `True` | Filter out sky regions from point clouds and Gaussians |
@@ -111,7 +125,9 @@ Returns the output directory path (`str`), or `None` if the input was skipped.
 | `confidence_percentile` | `float` | `10.0` | Percentile threshold for confidence filtering (bottom N% removed) |
 | `edge_normal_threshold` | `float` | `1.0` | Normal edge detection tolerance |
 | `edge_depth_threshold` | `float` | `0.03` | Depth edge detection relative tolerance |
 **Compression Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `compress_pts` | `bool` | `True` | Compress point clouds via voxel merging + random sampling |
@@ -119,25 +135,33 @@ Returns the output directory path (`str`), or `None` if the input was skipped.
 | `compress_pts_voxel_size` | `float` | `0.002` | Voxel size for point cloud merging |
 | `max_resolution` | `int` | `1920` | Maximum resolution for saved output images |
 | `compress_gs_max_points` | `int` | `5,000,000` | Maximum number of Gaussians after voxel pruning |
 **Prior Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `prior_cam_path` | `str` | `None` | Path to camera parameters JSON file |
 | `prior_depth_path` | `str` | `None` | Path to directory containing depth map files |
 **Rendered Video Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `save_rendered` | `bool` | `False` | Render interpolated fly-through video from Gaussian splats |
 | `render_interp_per_pair` | `int` | `15` | Number of interpolated frames between each camera pair |
 | `render_depth` | `bool` | `False` | Also render a depth visualization video |
 **Misc Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `log_time` | `bool` | `True` | Print timing report and save `pipeline_timing.json` |
 | `strict_output_path` | `str` | `None` | If set, save results directly to this path without `<case_name>/<timestamp>` subdirectories |
 ---
 ### CLI Reference
 All `__call__` parameters are exposed as CLI arguments:
 ```bash
 python -m hyworld2.worldrecon.pipeline \
     --input_path path/to/images \
@@ -146,7 +170,9 @@ python -m hyworld2.worldrecon.pipeline \
     --prior_cam_path path/to/camera_params.json \
     --prior_depth_path path/to/depth_dir/ \
 ```
 **Boolean flag conventions:**
 | Enable | Disable |
 |--------|---------|
 | `--save_colmap` | *(omit)* |
@@ -164,7 +190,9 @@ python -m hyworld2.worldrecon.pipeline \
 | *(default on)* `save_points` | `--no_save_points` |
 | `--save_rendered` | *(omit)* |
 | `--render_depth` | *(omit)* |
 **Additional CLI-only arguments:**
 | Argument | Description |
 |----------|-------------|
 | `--config_path` | Training config YAML for custom checkpoint loading |
@@ -174,9 +202,11 @@ python -m hyworld2.worldrecon.pipeline \
 | `--fsdp_cpu_offload` | Offload FSDP params to CPU |
 | `--disable_heads` | Space-separated list of heads to disable (e.g. `--disable_heads camera normal`) |
 | `--no_interactive` | Exit after first inference (skip interactive prompt loop) |
 ---
 ### Output Format
 #### File Structure
 ```
 inference_output/
 └── <case_name>/
@@ -201,8 +231,10 @@ inference_output/
         │   └── rendered_depth.mp4  # (if --render_depth)
         └── pipeline_timing.json    # Performance timing report
 ```
 #### Prediction Dictionary
 When using the Python API, `pipeline(...)` internally produces a `predictions` dictionary with the following keys:
 ```python
 # Geometry
 predictions["depth"]        # [B, S, H, W, 1]  — Z-depth in camera frame
@@ -223,17 +255,22 @@ predictions["splats"]["opacities"]  # [B, N]    — Gaussian opacities
 predictions["splats"]["sh"]         # [B, N, 1, 3] — Spherical harmonics (degree 0)
 predictions["splats"]["weights"]    # [B, N]    — Per-Gaussian confidence weights
 ```
 Where `B` = batch size (always 1 for inference), `S` = number of input views, `H, W` = image dimensions, `N` = total Gaussians (`S × H × W`).
 ---
 ### Prior Injection
 WorldMirror 2.0 accepts three types of geometric priors as conditioning inputs. Priors are automatically detected from the provided files.
 | Prior Type | Condition | Input Format |
 |------------|-----------|--------------|
 | Camera Pose | `cond_flags[0]` | c2w 4×4 matrix (OpenCV convention) |
 | Depth Map | `cond_flags[1]` | Per-view float depth maps |
 | Intrinsics | `cond_flags[2]` | 3×3 intrinsic matrix |
 #### Camera Parameters (JSON)
 The camera parameter file follows the same format as the `camera_params.json` output by the pipeline:
 ```json
 {
   "num_cameras": 2,
@@ -260,26 +297,32 @@ The camera parameter file follows the same format as the `camera_params.json` ou
   ]
 }
 ```
 **Field descriptions:**
 | Field | Description |
 |-------|-------------|
 | `camera_id` | Integer index (`0`, `1`, `2`, ...) or image filename stem without extension (e.g., `"image_0001"`) |
 | `extrinsics.matrix` | 4×4 camera-to-world (c2w) transformation matrix, OpenCV coordinate convention |
 | `intrinsics.matrix` | 3×3 camera intrinsic matrix in pixels (`fx, fy` = focal lengths; `cx, cy` = principal point) |
 **Important notes:**
 - `extrinsics` and `intrinsics` lists can be provided independently or together. An empty list `[]` or missing key means that prior is unavailable.
 - **Intrinsics resolution:** Values should correspond to the **original image resolution**. The pipeline automatically adjusts for inference-time resize + center-crop.
 - **Extrinsics alignment:** The pipeline automatically normalizes all extrinsics relative to the first view, consistent with training behavior.
 #### Depth Maps (Folder)
 Depth maps are stored as individual files in a directory. Filenames should match the input image filenames. Supported formats: `.npy`, `.exr`, `.png` (16-bit).
 ```
 prior_depth/
 ├── image_0001.npy    # float32, shape [H, W]
 ├── image_0002.npy
 └── ...
 ```
 #### Combining Priors
 Priors can be freely combined. Examples:
 ```bash
 # Only intrinsics
 python -m hyworld2.worldrecon.pipeline --input_path images/ \
@@ -292,6 +335,7 @@ python -m hyworld2.worldrecon.pipeline --input_path images/ \
     --prior_cam_path camera_params.json \
     --prior_depth_path depth_maps/
 ```
 ---
 ### Multi-GPU Inference
 WorldMirror 2.0 supports **Sequence Parallel (SP)** inference across multiple GPUs, where token sequences are sharded across ranks in the ViT backbone, and DPT heads process frames in parallel.
@@ -316,16 +360,19 @@ pipeline = WorldMirrorPipeline.from_pretrained(
 )
 pipeline('path/to/images')
 ```
 **What happens under the hood:**
 1. `from_pretrained` auto-detects `WORLD_SIZE > 1` and initializes `torch.distributed`.
 2. The model is loaded on rank 0 and broadcast via `sync_module_states=True`.
 3. FSDP shards parameters across the SP process group.
 4. DPT prediction heads split frames across ranks and `AllGather` results.
 5. Post-processing (mask computation, saving) runs on rank 0 only.
 ---
 ### Advanced Options
 #### Disabling Prediction Heads
 To save memory when you only need specific outputs:
 ```python
 from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
@@ -334,6 +381,7 @@ pipeline = WorldMirrorPipeline.from_pretrained(
     disable_heads=["normal", "points"],  # free ~200M params
 )
 ```
 Available heads: `"camera"`, `"depth"`, `"normal"`, `"points"`, `"gs"`.
 #### Mask Filtering
 The pipeline supports three types of output filtering to improve point cloud and Gaussian quality:
@@ -346,10 +394,12 @@ When `compress_pts=True` (default), the depth-derived point cloud undergoes:
 1. **Voxel merging**: Points within each voxel (size controlled by `compress_pts_voxel_size`) are merged via weighted averaging.
 2. **Random subsampling**: If the result exceeds `compress_pts_max_points`, points are uniformly subsampled.
 Similarly, Gaussians are voxel-pruned (weighted averaging of means, scales, quaternions, colors, opacities) and optionally subsampled to `compress_gs_max_points`.
 ---
 ### Gradio App
 An interactive web demo for WorldMirror 2.0. Upload images or videos and visualize 3DGS, point clouds, depth maps, normal maps, and camera parameters in your browser.
 **Quick start:**
 ```bash
 # Single GPU
 python -m hyworld2.worldrecon.gradio_app
@@ -358,17 +408,23 @@ python -m hyworld2.worldrecon.gradio_app
 torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
     --use_fsdp --enable_bf16
 ```
 **With a local checkpoint:**
 ```bash
 python -m hyworld2.worldrecon.gradio_app \
     --config_path /path/to/config.yaml \
     --ckpt_path /path/to/checkpoint.safetensors
 ```
 **With a public link (e.g., for Colab or remote servers):**
 ```bash
 python -m hyworld2.worldrecon.gradio_app --share
 ```
 **Arguments:**
 | Argument | Default | Description |
 |----------|---------|-------------|
 | `--port` | `8081` | Server port |
@@ -382,6 +438,7 @@ python -m hyworld2.worldrecon.gradio_app --share
 | `--fsdp_cpu_offload` | `False` | Offload FSDP params to CPU (saves GPU memory) |
 > **Important:** In multi-GPU mode, the number of input images must be **>= the number of GPUs**.
 ---
 ## Panorama Generation
 *Coming soon.*
@@ -390,6 +447,7 @@ This section will document the panorama generation model, including:
 - Model architecture (MMDiT-based implicit perspective-to-ERP mapping)
 - Configuration parameters
 - Output formats
 ---
 ## World Generation
 *Coming soon.*

 # HunyuanWorld 2.0 — Documentation
 This document provides detailed usage guides, parameter references, and output format specifications for each component of HunyuanWorld 2.0.
 ## Table of Contents
 - [WorldMirror 2.0 (World Reconstruction)](#worldmirror-20-world-reconstruction)
   - [Overview](#overview)
   - [Gradio App](#gradio-app)
 - [Panorama Generation](#panorama-generation)
 - [World Generation](#world-generation)
 ---
 ## WorldMirror 2.0 (World Reconstruction)
 ### Overview
 - **Normalized RoPE** for flexible resolution inference
 - **Depth mask prediction** for robust invalid pixel handling
 - **Sequence Parallel + FSDP + BF16** for efficient multi-GPU inference
 ---
 ### Python API
 #### `WorldMirrorPipeline.from_pretrained`
 Factory method to load the model and create a pipeline instance.
 ```python
 from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
     disable_heads=None,
 )
 ```
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `pretrained_model_name_or_path` | `str` | `"tencent/HY-World-2.0"` | HuggingFace repo ID or local path |
 | `enable_bf16` | `bool` | `False` | Use bfloat16 precision (except numerically critical layers) |
 | `fsdp_cpu_offload` | `bool` | `False` | Offload FSDP parameters to CPU (saves GPU memory at the cost of speed) |
 | `disable_heads` | `list[str]` | `None` | Heads to disable and free from memory. Options: `"camera"`, `"depth"`, `"normal"`, `"points"`, `"gs"` |
 **Notes:**
 - Distributed mode is auto-detected from `WORLD_SIZE` environment variable (set by `torchrun`).
 - When using multi-GPU, each rank must call `from_pretrained` — the method handles `dist.init_process_group` internally.
 ---
 #### `WorldMirrorPipeline.__call__`
 Run inference on a set of images or a video.
 ```python
 result = pipeline(
     input_path,
     **kwargs,
 )
 ```
 Returns the output directory path (`str`), or `None` if the input was skipped.
 **Inference Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `input_path` | `str` | *(required)* | Directory of images or path to a video file |
 | `video_strategy` | `str` | `"new"` | Video frame extraction strategy: `"new"` (motion-aware) or `"old"` (uniform FPS) |
 | `video_min_frames` | `int` | `1` | Minimum number of frames to extract from video |
 | `video_max_frames` | `int` | `32` | Maximum number of frames to extract from video |
 **Save Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `save_depth` | `bool` | `True` | Save per-view depth maps (PNG visualization + NPY raw values) |
 | `save_colmap` | `bool` | `False` | Save COLMAP-format sparse reconstruction (`sparse/0/`) |
 | `save_conf` | `bool` | `False` | Save depth confidence maps |
 | `save_sky_mask` | `bool` | `False` | Save sky segmentation masks |
 **Mask Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `apply_sky_mask` | `bool` | `True` | Filter out sky regions from point clouds and Gaussians |
 | `confidence_percentile` | `float` | `10.0` | Percentile threshold for confidence filtering (bottom N% removed) |
 | `edge_normal_threshold` | `float` | `1.0` | Normal edge detection tolerance |
 | `edge_depth_threshold` | `float` | `0.03` | Depth edge detection relative tolerance |
 **Compression Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `compress_pts` | `bool` | `True` | Compress point clouds via voxel merging + random sampling |
 | `compress_pts_voxel_size` | `float` | `0.002` | Voxel size for point cloud merging |
 | `max_resolution` | `int` | `1920` | Maximum resolution for saved output images |
 | `compress_gs_max_points` | `int` | `5,000,000` | Maximum number of Gaussians after voxel pruning |
 **Prior Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `prior_cam_path` | `str` | `None` | Path to camera parameters JSON file |
 | `prior_depth_path` | `str` | `None` | Path to directory containing depth map files |
 **Rendered Video Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `save_rendered` | `bool` | `False` | Render interpolated fly-through video from Gaussian splats |
 | `render_interp_per_pair` | `int` | `15` | Number of interpolated frames between each camera pair |
 | `render_depth` | `bool` | `False` | Also render a depth visualization video |
 **Misc Parameters:**
 | Parameter | Type | Default | Description |
 |-----------|------|---------|-------------|
 | `log_time` | `bool` | `True` | Print timing report and save `pipeline_timing.json` |
 | `strict_output_path` | `str` | `None` | If set, save results directly to this path without `<case_name>/<timestamp>` subdirectories |
 ---
 ### CLI Reference
 All `__call__` parameters are exposed as CLI arguments:
 ```bash
 python -m hyworld2.worldrecon.pipeline \
     --input_path path/to/images \
     --prior_cam_path path/to/camera_params.json \
     --prior_depth_path path/to/depth_dir/ \
 ```
 **Boolean flag conventions:**
 | Enable | Disable |
 |--------|---------|
 | `--save_colmap` | *(omit)* |
 | *(default on)* `save_points` | `--no_save_points` |
 | `--save_rendered` | *(omit)* |
 | `--render_depth` | *(omit)* |
 **Additional CLI-only arguments:**
 | Argument | Description |
 |----------|-------------|
 | `--config_path` | Training config YAML for custom checkpoint loading |
 | `--fsdp_cpu_offload` | Offload FSDP params to CPU |
 | `--disable_heads` | Space-separated list of heads to disable (e.g. `--disable_heads camera normal`) |
 | `--no_interactive` | Exit after first inference (skip interactive prompt loop) |
 ---
 ### Output Format
 #### File Structure
 ```
 inference_output/
 └── <case_name>/
         │   └── rendered_depth.mp4  # (if --render_depth)
         └── pipeline_timing.json    # Performance timing report
 ```
 #### Prediction Dictionary
 When using the Python API, `pipeline(...)` internally produces a `predictions` dictionary with the following keys:
 ```python
 # Geometry
 predictions["depth"]        # [B, S, H, W, 1]  — Z-depth in camera frame
 predictions["splats"]["sh"]         # [B, N, 1, 3] — Spherical harmonics (degree 0)
 predictions["splats"]["weights"]    # [B, N]    — Per-Gaussian confidence weights
 ```
 Where `B` = batch size (always 1 for inference), `S` = number of input views, `H, W` = image dimensions, `N` = total Gaussians (`S × H × W`).
 ---
 ### Prior Injection
 WorldMirror 2.0 accepts three types of geometric priors as conditioning inputs. Priors are automatically detected from the provided files.
 | Prior Type | Condition | Input Format |
 |------------|-----------|--------------|
 | Camera Pose | `cond_flags[0]` | c2w 4×4 matrix (OpenCV convention) |
 | Depth Map | `cond_flags[1]` | Per-view float depth maps |
 | Intrinsics | `cond_flags[2]` | 3×3 intrinsic matrix |
 #### Camera Parameters (JSON)
 The camera parameter file follows the same format as the `camera_params.json` output by the pipeline:
 ```json
 {
   "num_cameras": 2,
   ]
 }
 ```
 **Field descriptions:**
 | Field | Description |
 |-------|-------------|
 | `camera_id` | Integer index (`0`, `1`, `2`, ...) or image filename stem without extension (e.g., `"image_0001"`) |
 | `extrinsics.matrix` | 4×4 camera-to-world (c2w) transformation matrix, OpenCV coordinate convention |
 | `intrinsics.matrix` | 3×3 camera intrinsic matrix in pixels (`fx, fy` = focal lengths; `cx, cy` = principal point) |
 **Important notes:**
 - `extrinsics` and `intrinsics` lists can be provided independently or together. An empty list `[]` or missing key means that prior is unavailable.
 - **Intrinsics resolution:** Values should correspond to the **original image resolution**. The pipeline automatically adjusts for inference-time resize + center-crop.
 - **Extrinsics alignment:** The pipeline automatically normalizes all extrinsics relative to the first view, consistent with training behavior.
 #### Depth Maps (Folder)
 Depth maps are stored as individual files in a directory. Filenames should match the input image filenames. Supported formats: `.npy`, `.exr`, `.png` (16-bit).
 ```
 prior_depth/
 ├── image_0001.npy    # float32, shape [H, W]
 ├── image_0002.npy
 └── ...
 ```
 #### Combining Priors
 Priors can be freely combined. Examples:
 ```bash
 # Only intrinsics
 python -m hyworld2.worldrecon.pipeline --input_path images/ \
     --prior_cam_path camera_params.json \
     --prior_depth_path depth_maps/
 ```
 ---
 ### Multi-GPU Inference
 WorldMirror 2.0 supports **Sequence Parallel (SP)** inference across multiple GPUs, where token sequences are sharded across ranks in the ViT backbone, and DPT heads process frames in parallel.
 )
 pipeline('path/to/images')
 ```
 **What happens under the hood:**
 1. `from_pretrained` auto-detects `WORLD_SIZE > 1` and initializes `torch.distributed`.
 2. The model is loaded on rank 0 and broadcast via `sync_module_states=True`.
 3. FSDP shards parameters across the SP process group.
 4. DPT prediction heads split frames across ranks and `AllGather` results.
 5. Post-processing (mask computation, saving) runs on rank 0 only.
 ---
 ### Advanced Options
 #### Disabling Prediction Heads
 To save memory when you only need specific outputs:
 ```python
 from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
     disable_heads=["normal", "points"],  # free ~200M params
 )
 ```
 Available heads: `"camera"`, `"depth"`, `"normal"`, `"points"`, `"gs"`.
 #### Mask Filtering
 The pipeline supports three types of output filtering to improve point cloud and Gaussian quality:
 1. **Voxel merging**: Points within each voxel (size controlled by `compress_pts_voxel_size`) are merged via weighted averaging.
 2. **Random subsampling**: If the result exceeds `compress_pts_max_points`, points are uniformly subsampled.
 Similarly, Gaussians are voxel-pruned (weighted averaging of means, scales, quaternions, colors, opacities) and optionally subsampled to `compress_gs_max_points`.
 ---
 ### Gradio App
 An interactive web demo for WorldMirror 2.0. Upload images or videos and visualize 3DGS, point clouds, depth maps, normal maps, and camera parameters in your browser.
 **Quick start:**
 ```bash
 # Single GPU
 python -m hyworld2.worldrecon.gradio_app
 torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
     --use_fsdp --enable_bf16
 ```
 **With a local checkpoint:**
 ```bash
 python -m hyworld2.worldrecon.gradio_app \
     --config_path /path/to/config.yaml \
     --ckpt_path /path/to/checkpoint.safetensors
 ```
 **With a public link (e.g., for Colab or remote servers):**
 ```bash
 python -m hyworld2.worldrecon.gradio_app --share
 ```
 **Arguments:**
 | Argument | Default | Description |
 |----------|---------|-------------|
 | `--port` | `8081` | Server port |
 | `--fsdp_cpu_offload` | `False` | Offload FSDP params to CPU (saves GPU memory) |
 > **Important:** In multi-GPU mode, the number of input images must be **>= the number of GPUs**.
 ---
 ## Panorama Generation
 *Coming soon.*
 - Model architecture (MMDiT-based implicit perspective-to-ERP mapping)
 - Configuration parameters
 - Output formats
 ---
 ## World Generation
 *Coming soon.*