Improve model card: add authors, paper link, and usage instructions

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +37 -14
README.md CHANGED
@@ -2,32 +2,44 @@
2
  base_model:
3
  - black-forest-labs/FLUX.1-Fill-dev
4
  - microsoft/TRELLIS-image-large
 
5
  tags:
6
  - object-insertion
7
- - image-to-image
8
  - 3d-aware
9
  - pose-controllable-generation
10
- pipeline_tag: image-to-image
11
  ---
12
 
13
- # DIRECT
 
 
14
 
15
- This repository contains the model weights for **Direct 3D-Aware Object Insertion via Decomposed Visual Proxies**.
16
 
17
- DIRECT performs pose-controllable object insertion by decomposing the insertion condition into visual proxies, including a reference object image, a geometry proxy rendered from a reconstructed 3D object, and a scene context image.
18
 
19
- Project page: https://gong1130.github.io/DIRECT/
20
 
21
- Code: https://github.com/Gong1130/DIRECT
 
 
 
 
 
22
 
23
  ## Usage
24
 
25
- Please refer to the official code repository for installation instructions and **interactive demo** usage.
26
 
27
- ## Model Details
 
 
28
 
29
- This repository contains **DIRECT-specific** weights **only**:
30
 
 
 
 
31
  - `lora.safetensors`
32
  - `condition_embedder.safetensors`
33
  - `x_embedder.safetensors`
@@ -36,8 +48,19 @@ This repository contains **DIRECT-specific** weights **only**:
36
  - `image_projector.safetensors`
37
  - `config.json`
38
 
39
- The model requires the following **external** models:
 
 
 
 
 
 
40
 
41
- - `black-forest-labs/FLUX.1-Fill-dev`
42
- - `google/siglip2-so400m-patch14-384`
43
- - `microsoft/TRELLIS-image-large`
 
 
 
 
 
 
2
  base_model:
3
  - black-forest-labs/FLUX.1-Fill-dev
4
  - microsoft/TRELLIS-image-large
5
+ pipeline_tag: image-to-image
6
  tags:
7
  - object-insertion
 
8
  - 3d-aware
9
  - pose-controllable-generation
10
+ - image-to-image
11
  ---
12
 
13
+ # DIRECT: Direct 3D-Aware Object Insertion via Decomposed Visual Proxies
14
+
15
+ This repository contains the model weights for **DIRECT**, presented in the paper [Direct 3D-Aware Object Insertion via Decomposed Visual Proxies](https://huggingface.co/papers/2606.06601).
16
 
17
+ **Authors**: Jingbo Gong, Yikai Wang, Yushi Lan, Yuhao Wan, Ziheng Ouyang, Rui Zhao, Ming-Ming Cheng, Qibin Hou, and Chen Change Loy.
18
 
19
+ [**Project Page**](https://gong1130.github.io/DIRECT/) | [**Paper (ArXiv)**](https://arxiv.org/abs/2606.06601) | [**Code**](https://github.com/Gong1130/DIRECT)
20
 
21
+ ## Overview
22
 
23
+ DIRECT (Decomposed Injection for Reference Composition and Target-integration) is a framework that enables pose-controllable object insertion. It integrates interactive pose manipulation with high-fidelity 2D image synthesis by decomposing insertion conditions into three visual proxies:
24
+ - **Appearance guidance**: Captures visual details from the reference object image.
25
+ - **Geometry guidance**: Derived from a user-adjusted 3D proxy rendered from a reconstructed 3D object.
26
+ - **Context guidance**: From the target background scene.
27
+
28
+ By injecting these through separate pathways, DIRECT preserves reference appearance, follows user-specified poses, and adapts the object naturally to the target scene.
29
 
30
  ## Usage
31
 
32
+ Please refer to the [official GitHub repository](https://github.com/Gong1130/DIRECT) for installation instructions. You can run the interactive demo with the following command:
33
 
34
+ ```bash
35
+ python demo/demo.py --gradio_port 7860 --viser_port 8081
36
+ ```
37
 
38
+ The demo allows you to segment a reference object, reconstruct it in 3D, and interactively manipulate its pose within the background image.
39
 
40
+ ## Model Details
41
+
42
+ This repository contains **DIRECT-specific** weights only:
43
  - `lora.safetensors`
44
  - `condition_embedder.safetensors`
45
  - `x_embedder.safetensors`
 
48
  - `image_projector.safetensors`
49
  - `config.json`
50
 
51
+ The framework requires the following **external** foundation models:
52
+ - [black-forest-labs/FLUX.1-Fill-dev](https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev)
53
+ - [google/siglip2-so400m-patch14-384](https://huggingface.co/google/siglip2-so400m-patch14-384)
54
+ - [microsoft/TRELLIS-image-large](https://huggingface.co/microsoft/TRELLIS-image-large)
55
+ - [briaai/RMBG-2.0](https://huggingface.co/briaai/RMBG-2.0) (for background removal in the demo)
56
+
57
+ ## Citation
58
 
59
+ ```bibtex
60
+ @inproceedings{gong2026direct,
61
+ title = {Direct 3D-Aware Object Insertion via Decomposed Visual Proxies},
62
+ author = {Jingbo Gong and Yikai Wang and Yushi Lan and Yuhao Wan and Ziheng Ouyang and Rui Zhao and Ming-Ming Cheng and Qibin Hou and Chen Change Loy},
63
+ booktitle = {ICML},
64
+ year = {2026}
65
+ }
66
+ ```