Improve model card: add authors, paper link, and usage instructions
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -2,32 +2,44 @@
|
|
| 2 |
base_model:
|
| 3 |
- black-forest-labs/FLUX.1-Fill-dev
|
| 4 |
- microsoft/TRELLIS-image-large
|
|
|
|
| 5 |
tags:
|
| 6 |
- object-insertion
|
| 7 |
-
- image-to-image
|
| 8 |
- 3d-aware
|
| 9 |
- pose-controllable-generation
|
| 10 |
-
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# DIRECT
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
DIRECT
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## Usage
|
| 24 |
|
| 25 |
-
Please refer to the official
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
|
| 30 |
|
|
|
|
|
|
|
|
|
|
| 31 |
- `lora.safetensors`
|
| 32 |
- `condition_embedder.safetensors`
|
| 33 |
- `x_embedder.safetensors`
|
|
@@ -36,8 +48,19 @@ This repository contains **DIRECT-specific** weights **only**:
|
|
| 36 |
- `image_projector.safetensors`
|
| 37 |
- `config.json`
|
| 38 |
|
| 39 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- black-forest-labs/FLUX.1-Fill-dev
|
| 4 |
- microsoft/TRELLIS-image-large
|
| 5 |
+
pipeline_tag: image-to-image
|
| 6 |
tags:
|
| 7 |
- object-insertion
|
|
|
|
| 8 |
- 3d-aware
|
| 9 |
- pose-controllable-generation
|
| 10 |
+
- image-to-image
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# DIRECT: Direct 3D-Aware Object Insertion via Decomposed Visual Proxies
|
| 14 |
+
|
| 15 |
+
This repository contains the model weights for **DIRECT**, presented in the paper [Direct 3D-Aware Object Insertion via Decomposed Visual Proxies](https://huggingface.co/papers/2606.06601).
|
| 16 |
|
| 17 |
+
**Authors**: Jingbo Gong, Yikai Wang, Yushi Lan, Yuhao Wan, Ziheng Ouyang, Rui Zhao, Ming-Ming Cheng, Qibin Hou, and Chen Change Loy.
|
| 18 |
|
| 19 |
+
[**Project Page**](https://gong1130.github.io/DIRECT/) | [**Paper (ArXiv)**](https://arxiv.org/abs/2606.06601) | [**Code**](https://github.com/Gong1130/DIRECT)
|
| 20 |
|
| 21 |
+
## Overview
|
| 22 |
|
| 23 |
+
DIRECT (Decomposed Injection for Reference Composition and Target-integration) is a framework that enables pose-controllable object insertion. It integrates interactive pose manipulation with high-fidelity 2D image synthesis by decomposing insertion conditions into three visual proxies:
|
| 24 |
+
- **Appearance guidance**: Captures visual details from the reference object image.
|
| 25 |
+
- **Geometry guidance**: Derived from a user-adjusted 3D proxy rendered from a reconstructed 3D object.
|
| 26 |
+
- **Context guidance**: From the target background scene.
|
| 27 |
+
|
| 28 |
+
By injecting these through separate pathways, DIRECT preserves reference appearance, follows user-specified poses, and adapts the object naturally to the target scene.
|
| 29 |
|
| 30 |
## Usage
|
| 31 |
|
| 32 |
+
Please refer to the [official GitHub repository](https://github.com/Gong1130/DIRECT) for installation instructions. You can run the interactive demo with the following command:
|
| 33 |
|
| 34 |
+
```bash
|
| 35 |
+
python demo/demo.py --gradio_port 7860 --viser_port 8081
|
| 36 |
+
```
|
| 37 |
|
| 38 |
+
The demo allows you to segment a reference object, reconstruct it in 3D, and interactively manipulate its pose within the background image.
|
| 39 |
|
| 40 |
+
## Model Details
|
| 41 |
+
|
| 42 |
+
This repository contains **DIRECT-specific** weights only:
|
| 43 |
- `lora.safetensors`
|
| 44 |
- `condition_embedder.safetensors`
|
| 45 |
- `x_embedder.safetensors`
|
|
|
|
| 48 |
- `image_projector.safetensors`
|
| 49 |
- `config.json`
|
| 50 |
|
| 51 |
+
The framework requires the following **external** foundation models:
|
| 52 |
+
- [black-forest-labs/FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev)
|
| 53 |
+
- [google/siglip2-so400m-patch14-384](https://huggingface.co/google/siglip2-so400m-patch14-384)
|
| 54 |
+
- [microsoft/TRELLIS-image-large](https://huggingface.co/microsoft/TRELLIS-image-large)
|
| 55 |
+
- [briaai/RMBG-2.0](https://huggingface.co/briaai/RMBG-2.0) (for background removal in the demo)
|
| 56 |
+
|
| 57 |
+
## Citation
|
| 58 |
|
| 59 |
+
```bibtex
|
| 60 |
+
@inproceedings{gong2026direct,
|
| 61 |
+
title = {Direct 3D-Aware Object Insertion via Decomposed Visual Proxies},
|
| 62 |
+
author = {Jingbo Gong and Yikai Wang and Yushi Lan and Yuhao Wan and Ziheng Ouyang and Rui Zhao and Ming-Ming Cheng and Qibin Hou and Chen Change Loy},
|
| 63 |
+
booktitle = {ICML},
|
| 64 |
+
year = {2026}
|
| 65 |
+
}
|
| 66 |
+
```
|