studio display

OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation

Project Page arXiv HuggingFace Models Static Badge License

Yuheng Liu1*, Xin Lin2, Xinke Li3, Baihan Yang2, Chen Wang4*, Kalyan Sunkavalli5, Yannick Hold-Geoffroy5, Hao Tan5, Kai Zhang5, Xiaohui Xie1, Zifan Shi5, Yiwei Hu5

(*Work done during an internship at Adobe)

1UC Irvine, 2UC San Diego, 3City University of Hong Kong, 4University of Pennsylvania, 5Adobe Research

OmniRoam Teaser

Updates

  • [2026-04] πŸŽ‰ Initial release of code, models, and datasets
  • [2026-03] πŸŽ‰ Our paper is accepted to SIGGRAPH 2026!

Introduction

Modeling scenes using video generation models has garnered growing research interest in recent years. However, most existing approaches rely on perspective video models that synthesize only limited observations of a scene, leading to issues of completeness and global consistency.

We propose OmniRoam, a controllable panoramic video generation framework that exploits the rich per-frame scene coverage and inherent long-term spatial and temporal consistency of panoramic representation, enabling long-horizon scene wandering. Our framework begins with a preview stage, where a trajectory-controlled video generation model creates a quick overview of the scene from a given input image or video. Then, in the refine stage, this video is temporally extended and spatially upsampled to produce long-range, high-resolution videos, thus enabling high-fidelity world wandering.

To train our model, we introduce two panoramic video datasets that incorporate both synthetic and real-world captured videos. Experiments show that our framework consistently outperforms state-of-the-art methods in terms of visual quality, controllability, and long-term scene consistency, both qualitatively and quantitatively. We further showcase several extensions of this framework, including real-time video generation and 3D reconstruction.

Environment Setup

Prerequisites

  • OS: Linux (tested on Ubuntu 20.04+)
  • GPU: CUDA-compatible GPU with β‰₯20GB VRAM (β‰₯36GB recommended for refine stage)
  • CUDA: 11.8 or higher
  • Python: 3.9+
  • FFmpeg: For video processing

Step 1: Create Conda Environment

# Download Rust and Cargo
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs/ | sh
. "$HOME/.cargo/env"

# Clone repository
git clone https://github.com/yuhengliu02/OmniRoam.git
cd OmniRoam

# Create and activate conda environment
conda create -n omniroam python=3.10
conda activate omniroam

# Install DiffSynth-Studio
# DiffSynth-Studio: https://github.com/modelscope/DiffSynth-Studio
pip install -e .

Step 2: Download Base Model (Wan2.1-T2V-1.3B)

OmniRoam is built upon the Wan-AI Wan2.1-T2V-1.3B video diffusion model.

# Download using provided script
python download_wan2.1.py

# Or manually download from Hugging Face
# Visit: https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B
# Download to: models/Wan-AI/Wan2.1-T2V-1.3B/

Step 3: Download OmniRoam Models

Download the Preview, Self-forcing, and Refine stage checkpoints:

# Option 1: Using our download script
python download_omniroam_models.py

# Option 2: Manual download from Hugging Face
# Visit: https://huggingface.co/Yuheng02/OmniRoam
# Download the following files:
# - preview.ckpt    β†’ models/OmniRoam/Preview/
# - self-forcing.pt β†’ models/OmniRoam/Self-forcing/
# - refine.ckpt     β†’ models/OmniRoam/Refine/

Final model directory structure:

models/
β”œβ”€β”€ Wan-AI/
β”‚   └── Wan2.1-T2V-1.3B/
└── OmniRoam/
    β”œβ”€β”€ Preview/
    β”‚   └── preview.ckpt
    β”œβ”€β”€ Self-forcing/
    β”‚   └── self-forcing.pt
    └── Refine/
        └── refine.ckpt

Step 4: Install Self-Forcing Package (Optional)

The Self-forcing stage requires additional dependencies:

Please refer to Self-Forcing/README.md for installation instructions.

OmniRoam Studio

We provide OmniRoam Studio, an interactive web-based interface for easy video generation with real-time preview and 360Β° panoramic viewing.

Features

  • 🎬 Multi-stage generation pipeline (Preview, Self-forcing, Refine)
  • πŸ–ΌοΈ Interactive 360Β° panoramic video player
  • 🎯 Visual trajectory presets with real-time preview
  • πŸ“Š Built-in gallery with download support
  • βš™οΈ Customizable generation parameters

Studio Interface

Quick Start

cd Studio

# Run before installing all dependencies

# Terminal 1: Start backend
conda activate omniroam
python main.py

# Terminal 2: Start frontend
cd frontend
npm install  # First time only
npm run dev

For detailed Studio documentation, see Studio/README.md

Dataset

InteriorGS-360video Dataset

We render panoramic videos from the InteriorGS 3D Gaussian Splatting dataset using our custom rendering pipeline.

Note: Due to current policy restrictions, you need to process the InteriorGS dataset using our open-source rendering tools.

Rendering Pipeline

We provide a complete Blender-based rendering pipeline to generate panoramic videos from InteriorGS 3DGS models:

cd InteriorGS-Render

# Download 3DGS models and camera trajectories (see InteriorGS-Render/README.md)
# Then run rendering
./run_simple.sh 1 2 3 4 5 200  # Process splits 1-5 out of 200

Output: Each dataset generates:

  • 800-frame panoramic video (1920x960, 30fps)
  • Camera trajectory JSON file
  • PNG image sequence

For detailed rendering instructions, see InteriorGS-Render/README.md

Dataset Structure

After rendering, organize your dataset as follows:

data/InteriorGS-360video/
β”œβ”€β”€ 0001_839920/
β”‚   β”œβ”€β”€ pano_camera0/
β”‚   β”‚   β”œβ”€β”€ frame_0001.png
β”‚   β”‚   β”œβ”€β”€ frame_0002.png
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ video.mp4
β”‚   └── transforms.json
β”œβ”€β”€ 0002_123456/
└── ...

Inference

Results

https://github.com/user-attachments/assets/6323d12f-4df1-4924-8b46-6e78ab1c64ee

Preview Stage

Generate quick preview videos from panoramic images:

# Basic usage
python infer_omniroam.py \
  --local_images_dir vis_images \
  --height 480 \
  --width 960 \
  --num_frames 81 \
  --ckpt_path models/OmniRoam/Preview/preview.ckpt \
  --enable_speed_control \
  --speed_fixed 1.0 \
  --use_cam_traj \
  --traj_mode fixed \
  --traj_preset forward \
  --re_scale_pose fixed:1.0 \
  --traj_s_curve_amp_m 1.4 \
  --traj_loop_radius_m 1.5 \
  --cfg_scale 5.0 \
  --num_inference_steps 50 \
  --output_dir ./vis_ours_480p_speed_1_forward \
  --devices cuda:0,cuda:1,cuda:2,cuda:3,cuda:4,cuda:5,cuda:6,cuda:7

# Or use the provided script
./infer_preview.sh

Trajectory Presets: forward, backward, left, right, s_curve, loop

Set scale using --speed_fixed between 1.0 and 8.0.

Self-Forcing Stage

For fast preview generation with Self-forcing distillation:

cd Self-Forcing

# Run inference on local panoramas
./inference_local_panoramas.sh

# Or use custom inference script
python custom_inference.py \
  --config_path configs/self_forcing_dmd_omniroam.yaml \
  --checkpoint_path models/OmniRoam/Self-forcing/self-forcing.pt \
  --local_folder /path/to/panoramas \
  --traj_preset forward \
  --traj_step_m 1.0 \
  --output_folder ./self_forcing_output \
  --num_samples 5

Parameters:

  • --traj_preset: Camera trajectory (forward, backward, left, right)
  • --traj_step_m: Step size in meters per latent timestep
  • --speed_scalar: Speed multiplier (default: 1.0)
  • --height: Output height (default: 480)
  • --width: Output width (default: 960)

Refine Stage

Upscale and extend preview videos to high resolution and long-horizon:

# Refine preview videos
python infer_omniroam.py \
  --enable_refine \
  --refine_local_dir path/to/generated/preview/videos \
  --refine_num_segments 8 \
  --refine_degrade_down_h 480 \
  --refine_degrade_down_w 960 \
  --refine_use_crossfade \
  --refine_crossfade_alpha 0.5 \
  --height 720 \
  --width 1440 \
  --num_frames 81 \
  --ckpt_path models/OmniRoam/Refine/refine.ckpt \
  --output_dir ./refined \
  --devices cuda:0,cuda:1,cuda:2,cuda:3,cuda:4,cuda:5,cuda:6,cuda:7

# Or use the provided script
./infer_refine.sh

Parameters:

  • --refine_num_segments: Number of temporal segments for long video generation

Training

Data Preparation

Render InteriorGS Dataset (see Dataset section above)

Training Preview Stage

./train_preview.sh

Configuration:

  • Edit train_preview.sh to customize:
    • Data paths (DATA_ROOT, SPLIT_JSON)
    • Model paths (PRETRAIN_MODEL_PATH)
    • Training hyperparameters (batch size, learning rate, etc.)
    • Output directory (OUTPUT_DIR)

Training Refine Stage

# Train refine model for upsampling
./train_refine.sh

Configuration:

  • Similar to preview training, edit train_refine.sh for custom settings
  • Requires preview-stage generated videos as input

Monitoring Training

Training logs and checkpoints are saved to OUTPUT_DIR:

OUTPUT_DIR/
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ checkpoint_epoch_001.ckpt
β”‚   β”œβ”€β”€ checkpoint_epoch_002.ckpt
β”‚   └── ...
β”œβ”€β”€ logs/
β”‚   └── training.log
└── samples/
    └── epoch_001/

Project Structure

OmniRoam/
β”œβ”€β”€ configs/                    # Configuration files
β”œβ”€β”€ data/                       # Dataset directory
β”œβ”€β”€ diffsynth/                  # Core diffusion synthesis modules
β”œβ”€β”€ models/                     # Model checkpoints
β”œβ”€β”€ output/                     # Training outputs
β”œβ”€β”€ Self-Forcing/               # Self-forcing stage code
β”œβ”€β”€ Studio/                     # Web interface
β”œβ”€β”€ InteriorGS-Render/          # Dataset rendering pipeline
β”œβ”€β”€ Tools/                      # Utility tools
β”œβ”€β”€ infer_omniroam.py           # Main inference script
β”œβ”€β”€ train_omniroam.py           # Main training script
β”œβ”€β”€ download_wan2.1.py          # Download base model
└── download_omniroam_models.py # Download OmniRoam models

Tools

Perspective Conversion

Convert equirectangular panoramas to perspective view:

cd Tools

# Single direction
python erp_to_perspective.py -i input.mp4 -o output.mp4 --direction forward

# Batch processing
python erp_to_perspective.py --batch "vis_ours_480p_speed_1_{dir}/in_01/generated.mp4"

Camera Trajectory Visualization

Visualize camera trajectories:

cd Tools
python panoramic_cam.py --traj_type forward --num_cameras 40 --step 0.1

Acknowledgments

We thank the following projects for their inspiring work, our code is partially based on the code from these projects:

  • ReCamMaster: Camera-controlled generative rendering from a single video
  • Self-Forcing: Self-forcing distillation for fast diffusion models
  • Wan-AI: Base video diffusion model
  • InteriorGS: 3D Gaussian Splatting dataset

Citation

If you find OmniRoam useful for your research, please cite:

@article{omniroam2026,
  title={OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation},
  author={Yuheng Liu and Xin Lin and Xinke Li and Baihan Yang and Chen Wang and Kalyan Sunkavalli and Yannick Hold-Geoffroy and Hao Tan and Kai Zhang and Xiaohui Xie and Zifan Shi and Yiwei Hu},
  journal={SIGGRAPH},
  year={2026}
}

License

This project is released under the Adobe Research License for noncommercial research purposes only.

Contact

For questions or issues:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Yuheng02/OmniRoam

Finetuned
(38)
this model

Paper for Yuheng02/OmniRoam