Multi-Level Conditioning by Pairing Localized Text and Sketch for Fashion Image Generation
This is the official implementation of the LOTS adapter from the paper "Multi-Level Conditioning by Pairing Localized Text and Sketch for Fashion Image Generation", as an extension of our prior work at ICCV25.
To access the Sketchy dataset, refer to the HuggingFace repository
Road Map
- Code release
- Weights release
- Platform release
Repository Structure
ckptsfolder
- Contains the pre-trained weights of the LOTS adapter.
scriptsfolder
- Contains all the scripts for training and inference with LOTS on Sketchy.
srcfolder
- Contains all the source code for the classes, models, and dataloaders used in the scripts.
Installation
Clone the repository
git clone https://huggingface.co/zyyyy/lots-extension
cd lots
We advise creating a Conda environment as follows
conda create -n lots python=3.12conda activate lotspip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121pip install -r requirements.txtpip install -e .
Training
We provide the script to train LOTS on our Sketchy dataset in scripts/lots/train_lots.py.
For an example of usage, check run_train.sh, which contains the default parameters used in our experiments.
Inference
You can test our pre-trained model with the inference script in scripts/lots/inference_lots.py.
For an example, check run_inference.sh.
This script generates an image for each item in the test split of Sketchy, and saves them in a structured folder, with each item identified by its unique ID.
Citation
If you find our work useful, please cite our work:
@article{liu2026multi,
title={Multi-Level Conditioning by Pairing Localized Text and Sketch for Fashion Image Generation},
author={Liu, Ziyue and Talon, Davide and Girella, Federico and Ruan, Zanxi and Mondo, Mattia and Bazzani, Loris and Wang, Yiming and Cristani, Marco},
journal={arXiv preprint arXiv:2602.18309},
year={2026}
}