RelationAdapter / README.md
nielsr's picture
nielsr HF Staff
Add pipeline tag and library name, copy README content
e9a7e6e verified
|
raw
history blame
4.89 kB
metadata
license: apache-2.0
pipeline_tag: image-to-image
library_name: diffusers

RelationAdapter

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
Yan Gong, Yiren Song, Yicheng Li, Chenglin Li, and Yin Zhang
Zhejiang University, National University of Singapore

arXiv HuggingFace HuggingFace


Quick Start

Configuration

1. Environment setup

git clone git@github.com:gy8888/RelationAdapter.git
cd RelationAdapter

conda create -n RelationAdapter python=3.11.10
conda activate RelationAdapter

2. Requirements installation

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install --upgrade -r requirements.txt

2. Inference

We provided the integration of FluxPipeline pipeline with our model and uploaded the model weights to huggingface, it's easy to use the our model as example below:

simply run the inference script:

python infer_single.py

3. Weights

You can download the trained checkpoints of RelationAdapter and LoRA for inference. Below are the details of available models.

You would need to load the RelationAdapter checkpoints model in order to fuse the LoRA checkpoints.

Model Description
RelationAdapter Additional parameters from the RelationAdapter module are trained on the Relation252K dataset
LoRA LoRA parameters are trained on the Relation252K dataset

4. Dataset

4.1 Paired Dataset Format

The paired dataset is stored in a .jsonl file, where each entry contains image file paths and corresponding text descriptions. Each entry includes source caption, target caption, and edit instruction describing the transformation from source image to target image.

Example format:

      {
            "left_image_description": "Description of the left image",
            "right_image_description": "Description of the right image",
            "edit_instruction": "Instructions for the desired modifications",
            "img_name": "path/to/image_pair.jpg"
      },
      {
            "left_image_description": "Description of the left image2",
            "right_image_description": "Description of the right image2",
            "edit_instruction": "Another instruction",
            "img_name": "path/to/image_pair2.jpg"
      }

We have uploaded our datasets to Hugging Face.

4.2 Run-Ready Dataset Generation

To prepare the dataset for relational learning tasks such as analogy-based instruction scenarios, use the provided script

python dataset-All-2000-turn-5test.py

This script takes the original paired image dataset and converts it into a structured format where each entry includes: Example format:

      {
            "cond1": "path/to/prompt_image.jpg",
            "cond2": "path/to/reference_image.jpg",
            "source": "path/to/source_image.jpg",
            "target": "path/to/target_image.jpg",
            "text": "Instruction for the intended modifications"
      },
      {
            "cond1": "path/to/prompt_image2.jpg",
            "cond2": "path/to/reference_image2.jpg",
            "source": "path/to/source_image2.jpg",
            "target": "path/to/target_image2.jpg",
            "text": "Instruction for the second modification"
      }

5. Results

S-U

Citation

@misc{gong2025relationadapterlearningtransferringvisual,
      title={RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers}, 
      author={Yan Gong and Yiren Song and Yicheng Li and Chenglin Li and Yin Zhang},
      year={2025},
      eprint={2506.02528},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.02528}, 
}