| --- |
| base_model: |
| - black-forest-labs/FLUX.1-Fill-dev |
| - microsoft/TRELLIS-image-large |
| pipeline_tag: image-to-image |
| tags: |
| - object-insertion |
| - 3d-aware |
| - pose-controllable-generation |
| - image-to-image |
| --- |
| |
| # DIRECT: Direct 3D-Aware Object Insertion via Decomposed Visual Proxies |
|
|
| This repository contains the model weights for **DIRECT**, presented in the paper [Direct 3D-Aware Object Insertion via Decomposed Visual Proxies](https://huggingface.co/papers/2606.06601). |
|
|
| **Authors**: Jingbo Gong, Yikai Wang, Yushi Lan, Yuhao Wan, Ziheng Ouyang, Rui Zhao, Ming-Ming Cheng, Qibin Hou, and Chen Change Loy. |
|
|
| [**Project Page**](https://gong1130.github.io/DIRECT/) | [**Paper (ArXiv)**](https://arxiv.org/abs/2606.06601) | [**Code**](https://github.com/Gong1130/DIRECT) |
|
|
| ## Overview |
|
|
| DIRECT (Decomposed Injection for Reference Composition and Target-integration) is a framework that enables pose-controllable object insertion. It integrates interactive pose manipulation with high-fidelity 2D image synthesis by decomposing insertion conditions into three visual proxies: |
| - **Appearance guidance**: Captures visual details from the reference object image. |
| - **Geometry guidance**: Derived from a user-adjusted 3D proxy rendered from a reconstructed 3D object. |
| - **Context guidance**: From the target background scene. |
|
|
| By injecting these through separate pathways, DIRECT preserves reference appearance, follows user-specified poses, and adapts the object naturally to the target scene. |
|
|
| ## Usage |
|
|
| Please refer to the [official GitHub repository](https://github.com/Gong1130/DIRECT) for installation instructions. You can run the interactive demo with the following command: |
|
|
| ```bash |
| python demo/demo.py --gradio_port 7860 --viser_port 8081 |
| ``` |
|
|
| The demo allows you to segment a reference object, reconstruct it in 3D, and interactively manipulate its pose within the background image. |
|
|
| ## Model Details |
|
|
| This repository contains **DIRECT-specific** weights only: |
| - `lora.safetensors` |
| - `condition_embedder.safetensors` |
| - `x_embedder.safetensors` |
| - `time_text_embed.safetensors` |
| - `pooled_image_projector.safetensors` |
| - `image_projector.safetensors` |
| - `config.json` |
|
|
| The framework requires the following **external** foundation models: |
| - [black-forest-labs/FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev) |
| - [google/siglip2-so400m-patch14-384](https://huggingface.co/google/siglip2-so400m-patch14-384) |
| - [microsoft/TRELLIS-image-large](https://huggingface.co/microsoft/TRELLIS-image-large) |
| - [briaai/RMBG-2.0](https://huggingface.co/briaai/RMBG-2.0) (for background removal in the demo) |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{gong2026direct, |
| title = {Direct 3D-Aware Object Insertion via Decomposed Visual Proxies}, |
| author = {Jingbo Gong and Yikai Wang and Yushi Lan and Yuhao Wan and Ziheng Ouyang and Rui Zhao and Ming-Ming Cheng and Qibin Hou and Chen Change Loy}, |
| booktitle = {ICML}, |
| year = {2026} |
| } |
| ``` |