File size: 8,676 Bytes
3288026
 
34eb995
 
 
 
 
 
3288026
 
 
 
 
 
 
 
85b6995
3288026
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1b4a6b4
3288026
 
 
 
 
 
 
 
 
 
 
 
0bafb00
3288026
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed3723b
3288026
 
 
 
 
ed3723b
3288026
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34eb995
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
---
license: apache-2.0
tags:
- Diffusion
- Augmentation
- PromptControlledDiffusion
- semanticsegmentation
- synthetic-data
---
---
license: apache-2.0
---
<div align="center">

# 🎨 SyntheticGen

### Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation (Not just to add more images, but to add the **right images**)

*Addressing class imbalance in remote sensing datasets through controlled synthetic generation*

[![Accepted at IEEE IGARSS 2026](https://img.shields.io/badge/Accepted-IEEE%20IGARSS%202026-1f77b4)](#)
[![arXiv Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b)](https://arxiv.org/abs/2602.04749)
[![Live Demo](https://img.shields.io/badge/Live%20Demo-Colab-orange?logo=googlecolab&logoColor=white)](https://colab.research.google.com/drive/11KqBQogdIjwC6UXAGVeD4cfq_VclUC_I?usp=sharing)
[![Hugging Face Weights](https://img.shields.io/badge/Weights-Hugging%20Face-yellow?logo=huggingface&logoColor=black)](https://huggingface.co/buddhi19/SyntheticGen/tree/main)
[![Dataset](https://img.shields.io/badge/Dataset-Google%20Drive-blue)](https://drive.google.com/drive/folders/14cMpLTgvcLdXhRY0kGhFKpDRMvpok90h?usp=sharing)
[![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

</div>

---

## 🌟 Overview

**SyntheticGen** (Official Implementation for the paper https://huggingface.co/papers/2602.04749) tackles the long-tail distribution problem in LoveDA by generating synthetic imagery with *explicit control* over class ratios. You can specify exactly what proportion of each land cover class should appear in the output.

## πŸ”₯πŸ”₯ Updates
- πŸš€ **Try SyntheticGen in 2 minutes β€” no setup required** at [**✨✨Live Demo✨✨**](https://colab.research.google.com/drive/11KqBQogdIjwC6UXAGVeD4cfq_VclUC_I?usp=sharing)
- πŸ€— Weights Released at [**HuggingFace**](https://huggingface.co/buddhi19/SyntheticGen/tree/main)
- Our paper was accepted to EEE International Geoscience and Remote Sensing Symposium (IGARSS) 2026.

### ✨ Highlights
- Two-stage pipeline: ratio-conditioned layout D3PM + ControlNet image synthesis.
- Full or sparse ratio control (e.g., `building:0.4`).
- Config-first workflow for reproducible experiments.

<div align="center">
  <img src="image.png" alt="SyntheticGen Results" width="100%">
</div>


---

## ❓ What we try to answer

πŸ›°οΈ **Why is remote-sensing segmentation still difficult, even with strong modern models?**  


Because the problem is not only in the model β€” it is also in the data. Some land-cover classes appear again and again, while others are so rare that the model barely gets a chance to learn them. In LoveDA, this becomes even more challenging because the dataset is split into **Urban** and **Rural** domains, each with different scene characteristics and different class distributions.

βš–οΈ **So what if we could control the data instead of just accepting it as it is?**  


That is exactly the idea behind **SyntheticGen**. Instead of using augmentation as a random process, SyntheticGen makes it **controllable**. Users can explicitly specify target class ratios and domain conditions during generation, making it possible to create synthetic samples that are not just more numerous, but more *useful*. This means rare classes can be strengthened deliberately, while still preserving realistic layouts and domain-consistent appearance.

🧠 **What makes SyntheticGen stand out?**  


Its strength lies in a carefully designed **two-stage pipeline**. First, a **ratio-conditioned discrete diffusion model** generates semantically meaningful layouts. Then, a **ControlNet-guided image synthesis stage** converts those layouts into realistic remote-sensing imagery. By separating **semantic control** from **visual rendering**, the framework achieves something highly valuable: it is both **principled** and **practical**.

πŸš€ **Why does that matter beyond this single benchmark?**  


Because this is not just another generative model for remote sensing. SyntheticGen introduces a targeted augmentation strategy for improving segmentation under **class imbalance** and **domain shift**, and shows that synthetic data can be used not just to add more images, but to add the **right images**.

🌍 **The bigger message**  


SyntheticGen is a step toward **data-centric remote-sensing segmentation** β€” a setting where the training distribution is no longer passively accepted, but actively designed. Our paper shows that better segmentation is not only about building better models, but also about building better data.

## πŸš€ Quick Start

### Installation
```bash
git clone https://github.com/Buddhi19/SyntheticGen.git
cd SyntheticGen
pip install -r requirements.txt
```
#### Install Dependencies
```bash
conda create -n diffusors python=3.10.19 -y
conda activate diffusors

# official PyTorch Linux + CUDA 12.8 install for v2.10.0
python -m pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128

python -m pip install -r SyntheticGen/requirements.txt
```

### Generate Your First Synthetic Image
```bash
python src/scripts/sample_pair.py \
  --config configs/sample_pair_ckpt40000_building0.4.yaml
```

---

## πŸ“š Usage

### Training Pipeline (Configs)

**Stage A: Train Layout Generator (D3PM)**
```bash
python src/scripts/train_layout_d3pm.py \
  --config configs/train_layout_d3pm_masked_sparse_80k.yaml
```

**(Optional) Ratio Prior for Sparse Conditioning**
```bash
python src/scripts/compute_ratio_prior.py \
  --config configs/compute_ratio_prior_loveda_train.yaml
```

**Stage B: Train Image Generator (ControlNet)**
```bash
python src/scripts/train_controlnet_ratio.py \
  --config configs/train_controlnet_ratio_loveda_1024.yaml
```

### Inference / Sampling (Configs)

**End-to-end sampling (layout -> image):**
```bash
python src/scripts/sample_pair.py \
  --config configs/sample_pair_ckpt40000_building0.4.yaml
```

**Override config parameters via CLI if needed:**
```bash
python src/scripts/sample_pair.py \
  --config configs/sample_pair_ckpt40000_building0.4.yaml \
  --ratios "building:0.4,forest:0.3" \
  --save_dir outputs/custom_generation
```

---

## βš™οΈ Configuration

All experiments are driven by YAML/JSON config files in `configs/`.

| Task | Script | Example Config |
|------|--------|----------------|
| Layout Training | `src/scripts/train_layout_d3pm.py` | `configs/train_layout_d3pm_masked_sparse_80k.yaml` |
| Ratio Prior | `src/scripts/compute_ratio_prior.py` | `configs/compute_ratio_prior_loveda_train.yaml` |
| ControlNet Training | `src/scripts/train_controlnet_ratio.py` | `configs/train_controlnet_ratio_loveda_1024.yaml` |
| Sampling / Inference | `src/scripts/sample_pair.py` | `configs/sample_pair_ckpt40000_building0.4.yaml` |

**Config tips**
- Examples live in `configs/`.
- To resume training, set `resume_from_checkpoint: "checkpoint-XXXXX"` in your config.
- Dataset roots and domains are centralized in configs; edit once, reuse everywhere.
- CLI flags override config values for quick experiments.

---

## πŸ“ Data Format

### LoveDA Dataset Structure
```
LoveDA/
  Train/
    Train/            # some releases include this extra nesting
      Urban/
        images_png/
        masks_png/
      Rural/
        images_png/
        masks_png/
    Urban/
      images_png/
      masks_png/
    Rural/
      images_png/
      masks_png/
  Val/
    ...
```

### Generic Dataset Structure
```
your_dataset/
  images/
    image_001.png
  masks/
    image_001.png   # label map with matching stem
```

---

## πŸ“¦ Pre-Generated Datasets

We provide synthetic datasets used in the paper:
https://drive.google.com/drive/folders/14cMpLTgvcLdXhRY0kGhFKpDRMvpok90h?usp=sharing

---

## 🧾 Outputs
- Checkpoints include `training_config.json` and `class_names.json`.
- Sampling writes `image.png`, `layout.png`, and `metadata.json`.

---

## πŸ“„ Citation
```bibtex
@misc{wijenayake2026mitigating,
      title={Mitigating Long-Tail Bias via Prompt-Controlled Diffusion Augmentation}, 
      author={Buddhi Wijenayake and Nichula Wasalathilake and Roshan Godaliyadda and Vijitha Herath and Parakrama Ekanayake and Vishal M. Patel},
      year={2026},
      eprint={2602.04749},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.04749}, 
}
```

---

## πŸ“ License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## πŸ™ Acknowledgments
- LoveDA dataset creators for high-quality annotated remote sensing data
- Hugging Face Diffusers for diffusion model infrastructure
- ControlNet authors for controllable generation

---