RelationAdapter / README.md

nielsr HF Staff

Add pipeline tag and library name, copy README content

e9a7e6e verified 10 months ago

4.89 kB

license: apache-2.0
pipeline_tag: image-to-image
library_name: diffusers

RelationAdapter

RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
Yan Gong, Yiren Song, Yicheng Li, Chenglin Li, and Yin Zhang
Zhejiang University, National University of Singapore

Quick Start

Configuration

1. Environment setup

git clone git@github.com:gy8888/RelationAdapter.git
cd RelationAdapter

conda create -n RelationAdapter python=3.11.10
conda activate RelationAdapter

2. Requirements installation

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install --upgrade -r requirements.txt

2. Inference

We provided the integration of FluxPipeline pipeline with our model and uploaded the model weights to huggingface, it's easy to use the our model as example below:

simply run the inference script:

python infer_single.py

3. Weights

You can download the trained checkpoints of RelationAdapter and LoRA for inference. Below are the details of available models.

You would need to load the RelationAdapter checkpoints model in order to fuse the LoRA checkpoints.

Model	Description
RelationAdapter	Additional parameters from the RelationAdapter module are trained on the `Relation252K` dataset
LoRA	LoRA parameters are trained on the `Relation252K` dataset

4. Dataset

4.1 Paired Dataset Format

The paired dataset is stored in a .jsonl file, where each entry contains image file paths and corresponding text descriptions. Each entry includes source caption, target caption, and edit instruction describing the transformation from source image to target image.

Example format:

      {
            "left_image_description": "Description of the left image",
            "right_image_description": "Description of the right image",
            "edit_instruction": "Instructions for the desired modifications",
            "img_name": "path/to/image_pair.jpg"
      },
      {
            "left_image_description": "Description of the left image2",
            "right_image_description": "Description of the right image2",
            "edit_instruction": "Another instruction",
            "img_name": "path/to/image_pair2.jpg"
      }

We have uploaded our datasets to Hugging Face.

4.2 Run-Ready Dataset Generation

To prepare the dataset for relational learning tasks such as analogy-based instruction scenarios, use the provided script

python dataset-All-2000-turn-5test.py

This script takes the original paired image dataset and converts it into a structured format where each entry includes: Example format:

      {
            "cond1": "path/to/prompt_image.jpg",
            "cond2": "path/to/reference_image.jpg",
            "source": "path/to/source_image.jpg",
            "target": "path/to/target_image.jpg",
            "text": "Instruction for the intended modifications"
      },
      {
            "cond1": "path/to/prompt_image2.jpg",
            "cond2": "path/to/reference_image2.jpg",
            "source": "path/to/source_image2.jpg",
            "target": "path/to/target_image2.jpg",
            "text": "Instruction for the second modification"
      }

5. Results

Citation

@misc{gong2025relationadapterlearningtransferringvisual,
      title={RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers}, 
      author={Yan Gong and Yiren Song and Yicheng Li and Chenglin Li and Yin Zhang},
      year={2025},
      eprint={2506.02528},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2506.02528}, 
}