license: apache-2.0
pipeline_tag: image-to-image
library_name: diffusers
RelationAdapter
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
Yan Gong, Yiren Song, Yicheng Li, Chenglin Li, and Yin Zhang
Zhejiang University, National University of Singapore
Quick Start
Configuration
1. Environment setup
git clone git@github.com:gy8888/RelationAdapter.git
cd RelationAdapter
conda create -n RelationAdapter python=3.11.10
conda activate RelationAdapter
2. Requirements installation
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install --upgrade -r requirements.txt
2. Inference
We provided the integration of FluxPipeline pipeline with our model and uploaded the model weights to huggingface, it's easy to use the our model as example below:
simply run the inference script:
python infer_single.py
3. Weights
You can download the trained checkpoints of RelationAdapter and LoRA for inference. Below are the details of available models.
You would need to load the RelationAdapter checkpoints model in order to fuse the LoRA checkpoints.
| Model | Description |
|---|---|
| RelationAdapter | Additional parameters from the RelationAdapter module are trained on the Relation252K dataset |
| LoRA | LoRA parameters are trained on the Relation252K dataset |
4. Dataset
4.1 Paired Dataset Format
The paired dataset is stored in a .jsonl file, where each entry contains image file paths and corresponding text descriptions. Each entry includes source caption, target caption, and edit instruction describing the transformation from source image to target image.
Example format:
{
"left_image_description": "Description of the left image",
"right_image_description": "Description of the right image",
"edit_instruction": "Instructions for the desired modifications",
"img_name": "path/to/image_pair.jpg"
},
{
"left_image_description": "Description of the left image2",
"right_image_description": "Description of the right image2",
"edit_instruction": "Another instruction",
"img_name": "path/to/image_pair2.jpg"
}
We have uploaded our datasets to Hugging Face.
4.2 Run-Ready Dataset Generation
To prepare the dataset for relational learning tasks such as analogy-based instruction scenarios, use the provided script
python dataset-All-2000-turn-5test.py
This script takes the original paired image dataset and converts it into a structured format where each entry includes: Example format:
{
"cond1": "path/to/prompt_image.jpg",
"cond2": "path/to/reference_image.jpg",
"source": "path/to/source_image.jpg",
"target": "path/to/target_image.jpg",
"text": "Instruction for the intended modifications"
},
{
"cond1": "path/to/prompt_image2.jpg",
"cond2": "path/to/reference_image2.jpg",
"source": "path/to/source_image2.jpg",
"target": "path/to/target_image2.jpg",
"text": "Instruction for the second modification"
}
5. Results
Citation
@misc{gong2025relationadapterlearningtransferringvisual,
title={RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers},
author={Yan Gong and Yiren Song and Yicheng Li and Chenglin Li and Yin Zhang},
year={2025},
eprint={2506.02528},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.02528},
}
