DIRECT: Direct 3D-Aware Object Insertion via Decomposed Visual Proxies

This repository contains the model weights for DIRECT, presented in the paper Direct 3D-Aware Object Insertion via Decomposed Visual Proxies.

Authors: Jingbo Gong, Yikai Wang, Yushi Lan, Yuhao Wan, Ziheng Ouyang, Rui Zhao, Ming-Ming Cheng, Qibin Hou, and Chen Change Loy.

Project Page | Paper (ArXiv) | Code

Overview

DIRECT (Decomposed Injection for Reference Composition and Target-integration) is a framework that enables pose-controllable object insertion. It integrates interactive pose manipulation with high-fidelity 2D image synthesis by decomposing insertion conditions into three visual proxies:

  • Appearance guidance: Captures visual details from the reference object image.
  • Geometry guidance: Derived from a user-adjusted 3D proxy rendered from a reconstructed 3D object.
  • Context guidance: From the target background scene.

By injecting these through separate pathways, DIRECT preserves reference appearance, follows user-specified poses, and adapts the object naturally to the target scene.

Usage

Please refer to the official GitHub repository for installation instructions. You can run the interactive demo with the following command:

python demo/demo.py --gradio_port 7860 --viser_port 8081

The demo allows you to segment a reference object, reconstruct it in 3D, and interactively manipulate its pose within the background image.

Model Details

This repository contains DIRECT-specific weights only:

  • lora.safetensors
  • condition_embedder.safetensors
  • x_embedder.safetensors
  • time_text_embed.safetensors
  • pooled_image_projector.safetensors
  • image_projector.safetensors
  • config.json

The framework requires the following external foundation models:

Citation

@inproceedings{gong2026direct,
  title     = {Direct 3D-Aware Object Insertion via Decomposed Visual Proxies},
  author    = {Jingbo Gong and Yikai Wang and Yushi Lan and Yuhao Wan and Ziheng Ouyang and Rui Zhao and Ming-Ming Cheng and Qibin Hou and Chen Change Loy},
  booktitle = {ICML},
  year      = {2026}
}
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for superGong/DIRECT

Finetuned
(31)
this model

Paper for superGong/DIRECT