DIRECT: Direct 3D-Aware Object Insertion via Decomposed Visual Proxies
This repository contains the model weights for DIRECT, presented in the paper Direct 3D-Aware Object Insertion via Decomposed Visual Proxies.
Authors: Jingbo Gong, Yikai Wang, Yushi Lan, Yuhao Wan, Ziheng Ouyang, Rui Zhao, Ming-Ming Cheng, Qibin Hou, and Chen Change Loy.
Project Page | Paper (ArXiv) | Code
Overview
DIRECT (Decomposed Injection for Reference Composition and Target-integration) is a framework that enables pose-controllable object insertion. It integrates interactive pose manipulation with high-fidelity 2D image synthesis by decomposing insertion conditions into three visual proxies:
- Appearance guidance: Captures visual details from the reference object image.
- Geometry guidance: Derived from a user-adjusted 3D proxy rendered from a reconstructed 3D object.
- Context guidance: From the target background scene.
By injecting these through separate pathways, DIRECT preserves reference appearance, follows user-specified poses, and adapts the object naturally to the target scene.
Usage
Please refer to the official GitHub repository for installation instructions. You can run the interactive demo with the following command:
python demo/demo.py --gradio_port 7860 --viser_port 8081
The demo allows you to segment a reference object, reconstruct it in 3D, and interactively manipulate its pose within the background image.
Model Details
This repository contains DIRECT-specific weights only:
lora.safetensorscondition_embedder.safetensorsx_embedder.safetensorstime_text_embed.safetensorspooled_image_projector.safetensorsimage_projector.safetensorsconfig.json
The framework requires the following external foundation models:
- black-forest-labs/FLUX.1-Fill-dev
- google/siglip2-so400m-patch14-384
- microsoft/TRELLIS-image-large
- briaai/RMBG-2.0 (for background removal in the demo)
Citation
@inproceedings{gong2026direct,
title = {Direct 3D-Aware Object Insertion via Decomposed Visual Proxies},
author = {Jingbo Gong and Yikai Wang and Yushi Lan and Yuhao Wan and Ziheng Ouyang and Rui Zhao and Ming-Ming Cheng and Qibin Hou and Chen Change Loy},
booktitle = {ICML},
year = {2026}
}
- Downloads last month
- 16
Model tree for superGong/DIRECT
Base model
black-forest-labs/FLUX.1-Fill-dev