File size: 5,661 Bytes
c1fe6c4 d4bcc24 c1fe6c4 5ba2f99 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | <table align="center" cellspacing="0" cellpadding="0" style="margin:0 auto;">
<tr>
<td valign="middle" style="padding-right:10px;">
<img src="./assets/image.png" alt="Logo" width="44">
</td>
<td valign="middle" align="center" style="font-size:36px; line-height:1.05; font-weight:900;">
OcclusionFormer: Arranging Z-Order<br>
for Layout-Grounded Image Generation
</td>
</tr>
</table>
<div align="center" style="margin-top:14px;">
<a href='https://henghuiding.com/OcclusionFormer/'><img src='https://img.shields.io/badge/Project-Page-Green'></a>
<a href='https://icml.cc/Downloads/2026'><img src='https://img.shields.io/badge/ICML-2026-blue'></a>
<a href='https://arxiv.org/'><img src='https://img.shields.io/badge/arXiv-Coming%20Soon-b31b1b'></a>
<a href='https://huggingface.co/'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Weights-Hugging Face%20-orange'></a>
<a href='https://huggingface.co/datasets/FudanCVL/SA-Z'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20SA--Z-Hugging Face%20-orange'></a>
</div>
<p align="center" style="margin:4px 0 0 0;">
<a href="https://github.com/ZyLieee/" target="_blank" style="font-size:1.28em; font-weight:700;">Ziye Li</a>,
<a href="https://henghuiding.com/" target="_blank" style="font-size:1.28em; font-weight:700;">Henghui Ding<sup>β</sup></a>
</p>
<p align="center" style="margin:2px 0 0 0; font-size:1.35em; font-weight:600;">Fudan University</p>
<p align="center" style="margin:1px 0 0 0; font-size:1.48em; font-weight:900; color:#ff6a00;">ICML 2026</p>
<p align="center" style="margin:1px 0 0 0; font-size:1.08em; color:#6b7280;"><em>β Corresponding Author</em></p>
## π₯ News
- [2026/05/18] Release **inference code**, **model weights** and **SA-Z dataset**.
- [2026/05/18] Release **OcclusionFormer open-source package** in this repository.
- [2026/4/30] OcclusionFormer is accepted to **ICML 2026**.
---
## π Introduction

**OcclusionFormer** addresses a core challenge in layout-to-image generation: when multiple bounding boxes overlap, standard methods often produce entangled textures and incorrect front/back ordering.
From the paper, OcclusionFormer introduces explicit **Z-order modeling** for layout-grounded generation by:
- decoupling instance generation,
- arranging occlusion order with a volume-rendering-inspired transmittance mechanism,
- and enforcing spatial precision with a queried alignment objective.
The paper also introduces **SA-Z**, a large-scale dataset with explicit occlusion order and amodal supervision for occlusion-aware layout generation.
---
## π§ Key Features
- **SA-Z Dataset Curation:** Enriches layout annotations with instance captions, explicit occlusion order, and amodal signals.

- **Occlusion-Aware DiT Framework:** Models Z-order dependencies explicitly rather than mixing overlapping instances implicitly.
- **Instance Decoupling + Volumetric Composition:** Improves robustness on dense overlap scenes by composing instances with transmittance-based ordering.
- **Queried Alignment Mechanism:** Improves spatial faithfulness and local semantic consistency.

## π» Quick Start
1. Environment setup
```bash
cd OcclusionFormer
conda create -n OcclusionFormer python=3.11 -y
conda activate OcclusionFormer
```
2. Install requirements
```bash
pip install --upgrade -r requirements.txt
```
3. Download checkpoint
```bash
https://huggingface.co/FudanCVL/OcclusionFormer
```
4. Run Streamlit demo
```bash
streamlit run demo_occlusionformer.py
```
5. Run CLI inference
```bash
python inference_occlusionformer.py \
--model_path /path/to/FLUX.1-dev \
--ckpt_path /path/to/occlusionformer_checkpoint_dir \
--layout_json ./examples/livingroom.json \
--output_dir ./outputs_occlusionformer \
--enable_layout \
--overwrite
```
Batch inference with a directory of JSON layouts:
```bash
python inference_occlusionformer.py \
--model_path /path/to/FLUX.1-dev \
--ckpt_path /path/to/occlusionformer_checkpoint_dir \
--layout_dir ./examples \
--output_dir ./outputs_occlusionformer \
--enable_layout \
--overwrite
```
---
## β
TODO
- [ ] Organize and update the **Amodal annotation** on Hugging Face.
---
## π Repository Scope
This folder provides a standalone inference/demo package:
- `demo_occlusionformer.py`: Streamlit demo UI
- `inference_occlusionformer.py`: CLI inference
- `src/occlusionformer/`: OcclusionFormer core modules
- `src/utils.py`, `src/transformer_utils.py`: required utility modules
- `examples/`: example layout JSON files
- `requirements.txt`: runtime dependencies
---
## βοΈ Inference Notes
- The demo and CLI follow the current project preprocessing logic and compose prompts using global prompt + instance captions.
- Layout control is enabled via `--enable_layout` (or disabled with `--disable_layout`).
- Outputs include generated images and layout overlays for visualization.
---
## π Acknowledgement
This work is built on many amazing research works and open-source projects. We thank the authors for sharing!
- [GLIGEN](https://github.com/gligen/GLIGEN)
- [InstanceAssemble](https://github.com/FireRedTeam/InstanceAssemble)
- [CreatiLayout](https://github.com/HuiZhang0812/CreatiLayout)
---
## π Citation
```bibtex
@inproceedings{li2026occlusionformer,
title={OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation},
author={Li, Ziye and Ding, Henghui},
booktitle={ICML},
year={2026}
}
```
|