File size: 5,661 Bytes
c1fe6c4
 
 
 
 
 
d4bcc24
 
c1fe6c4
 
 
5ba2f99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
<table align="center" cellspacing="0" cellpadding="0" style="margin:0 auto;">
  <tr>
    <td valign="middle" style="padding-right:10px;">
      <img src="./assets/image.png" alt="Logo" width="44">
    </td>
    <td valign="middle" align="center" style="font-size:36px; line-height:1.05; font-weight:900;">
      OcclusionFormer: Arranging Z-Order<br>
      for Layout-Grounded Image Generation
    </td>
  </tr>
</table>

<div align="center" style="margin-top:14px;">
<a href='https://henghuiding.com/OcclusionFormer/'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;&nbsp;&nbsp;&nbsp;
<a href='https://icml.cc/Downloads/2026'><img src='https://img.shields.io/badge/ICML-2026-blue'></a> &nbsp;&nbsp;&nbsp;&nbsp;
<a href='https://arxiv.org/'><img src='https://img.shields.io/badge/arXiv-Coming%20Soon-b31b1b'></a> &nbsp;&nbsp;&nbsp;&nbsp;
<a href='https://huggingface.co/'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Weights-Hugging Face%20-orange'></a> &nbsp;&nbsp;&nbsp;&nbsp;
<a href='https://huggingface.co/datasets/FudanCVL/SA-Z'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20SA--Z-Hugging Face%20-orange'></a>
</div>
<p align="center" style="margin:4px 0 0 0;">
  <a href="https://github.com/ZyLieee/" target="_blank" style="font-size:1.28em; font-weight:700;">Ziye Li</a>,
  <a href="https://henghuiding.com/" target="_blank" style="font-size:1.28em; font-weight:700;">Henghui Ding<sup>βœ‰</sup></a>
</p>
<p align="center" style="margin:2px 0 0 0; font-size:1.35em; font-weight:600;">Fudan University</p>
<p align="center" style="margin:1px 0 0 0; font-size:1.48em; font-weight:900; color:#ff6a00;">ICML 2026</p>
<p align="center" style="margin:1px 0 0 0; font-size:1.08em; color:#6b7280;"><em>βœ‰ Corresponding Author</em></p>

## πŸ”₯ News
- [2026/05/18] Release **inference code**, **model weights** and **SA-Z dataset**.
- [2026/05/18] Release **OcclusionFormer open-source package** in this repository.
- [2026/4/30] OcclusionFormer is accepted to **ICML 2026**.

---


## 😊 Introduction
![teaser](assets/teaser.png)
**OcclusionFormer** addresses a core challenge in layout-to-image generation: when multiple bounding boxes overlap, standard methods often produce entangled textures and incorrect front/back ordering.

From the paper, OcclusionFormer introduces explicit **Z-order modeling** for layout-grounded generation by:
- decoupling instance generation,
- arranging occlusion order with a volume-rendering-inspired transmittance mechanism,
- and enforcing spatial precision with a queried alignment objective.

The paper also introduces **SA-Z**, a large-scale dataset with explicit occlusion order and amodal supervision for occlusion-aware layout generation.

---

## πŸ”§ Key Features

- **SA-Z Dataset Curation:** Enriches layout annotations with instance captions, explicit occlusion order, and amodal signals.
![dataset](assets/dataset.png)
- **Occlusion-Aware DiT Framework:** Models Z-order dependencies explicitly rather than mixing overlapping instances implicitly.
- **Instance Decoupling + Volumetric Composition:** Improves robustness on dense overlap scenes by composing instances with transmittance-based ordering.
- **Queried Alignment Mechanism:** Improves spatial faithfulness and local semantic consistency.
![pipeline](assets/pipeline.png)


## πŸ’» Quick Start

1. Environment setup

```bash
cd OcclusionFormer
conda create -n OcclusionFormer python=3.11 -y
conda activate OcclusionFormer
```

2. Install requirements

```bash
pip install --upgrade -r requirements.txt
```

3. Download checkpoint

```bash
https://huggingface.co/FudanCVL/OcclusionFormer
```

4. Run Streamlit demo

```bash
streamlit run demo_occlusionformer.py
```

5. Run CLI inference

```bash
python inference_occlusionformer.py \
  --model_path /path/to/FLUX.1-dev \
  --ckpt_path /path/to/occlusionformer_checkpoint_dir \
  --layout_json ./examples/livingroom.json \
  --output_dir ./outputs_occlusionformer \
  --enable_layout \
  --overwrite
```

Batch inference with a directory of JSON layouts:

```bash
python inference_occlusionformer.py \
  --model_path /path/to/FLUX.1-dev \
  --ckpt_path /path/to/occlusionformer_checkpoint_dir \
  --layout_dir ./examples \
  --output_dir ./outputs_occlusionformer \
  --enable_layout \
  --overwrite
```

---
## βœ… TODO

- [ ] Organize and update the **Amodal annotation** on Hugging Face.

---
## πŸ“ Repository Scope

This folder provides a standalone inference/demo package:

- `demo_occlusionformer.py`: Streamlit demo UI
- `inference_occlusionformer.py`: CLI inference
- `src/occlusionformer/`: OcclusionFormer core modules
- `src/utils.py`, `src/transformer_utils.py`: required utility modules
- `examples/`: example layout JSON files
- `requirements.txt`: runtime dependencies

---

## βš™οΈ Inference Notes

- The demo and CLI follow the current project preprocessing logic and compose prompts using global prompt + instance captions.
- Layout control is enabled via `--enable_layout` (or disabled with `--disable_layout`).
- Outputs include generated images and layout overlays for visualization.

---

## πŸ‘ Acknowledgement

This work is built on many amazing research works and open-source projects. We thank the authors for sharing!

- [GLIGEN](https://github.com/gligen/GLIGEN)
- [InstanceAssemble](https://github.com/FireRedTeam/InstanceAssemble)
- [CreatiLayout](https://github.com/HuiZhang0812/CreatiLayout)

---

## πŸ’— Citation

```bibtex
@inproceedings{li2026occlusionformer,
  title={OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation},
  author={Li, Ziye and Ding, Henghui},
  booktitle={ICML},
  year={2026}
}
```