World Inverse Renderer
Video inverse rendering model based on NVIDIA Cosmos 7B video diffusion transformer, fine-tuned on custom dataset.
Model Description
This model performs inverse rendering on images and videos: given an input RGB frame, it estimates physically-based G-buffer maps:
- Basecolor (albedo)
- Normal (surface normals)
- Depth
- Roughness
- Metallic
These G-buffers can then be used with a forward renderer to relight the scene under arbitrary environment lighting (HDRI maps).
Architecture
- Based on NVIDIA Cosmos 7B video diffusion transformer
- Fine-tuned on custom dataset
- Supports both single-image and multi-frame video inverse rendering
Usage
# Inverse rendering on images
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_predict1/diffusion/inference/inference_inverse_renderer.py \
--checkpoint_dir checkpoints --diffusion_transformer_dir Diffusion_Renderer_Inverse_Cosmos_7B \
--dataset_path=your_input_images/ --num_video_frames 1 --group_mode webdataset \
--video_save_folder=output/ --save_video=False
# Inverse rendering on video frames
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python cosmos_predict1/diffusion/inference/inference_inverse_renderer.py \
--checkpoint_dir checkpoints --diffusion_transformer_dir Diffusion_Renderer_Inverse_Cosmos_7B \
--dataset_path=your_video_frames/ --num_video_frames 57 \
--video_save_folder=output/
Requirements
- Python 3.10
- NVIDIA GPU with >= 16GB VRAM (48GB+ recommended)
- CUDA 12.0+
- Downloads last month
- 270