World Inverse Renderer

Video inverse rendering model based on NVIDIA Cosmos 7B video diffusion transformer, fine-tuned on custom dataset.

Model Description

This model performs inverse rendering on images and videos: given an input RGB frame, it estimates physically-based G-buffer maps:

  • Basecolor (albedo)
  • Normal (surface normals)
  • Depth
  • Roughness
  • Metallic

These G-buffers can then be used with a forward renderer to relight the scene under arbitrary environment lighting (HDRI maps).

Architecture

  • Based on NVIDIA Cosmos 7B video diffusion transformer
  • Fine-tuned on custom dataset
  • Supports both single-image and multi-frame video inverse rendering

Usage

# Inverse rendering on images
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_predict1/diffusion/inference/inference_inverse_renderer.py \
    --checkpoint_dir checkpoints --diffusion_transformer_dir Diffusion_Renderer_Inverse_Cosmos_7B \
    --dataset_path=your_input_images/ --num_video_frames 1 --group_mode webdataset \
    --video_save_folder=output/ --save_video=False

# Inverse rendering on video frames
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_predict1/diffusion/inference/inference_inverse_renderer.py \
    --checkpoint_dir checkpoints --diffusion_transformer_dir Diffusion_Renderer_Inverse_Cosmos_7B \
    --dataset_path=your_video_frames/ --num_video_frames 57 \
    --video_save_folder=output/

Requirements

  • Python 3.10
  • NVIDIA GPU with >= 16GB VRAM (48GB+ recommended)
  • CUDA 12.0+
Downloads last month
270
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support