base_model:
- black-forest-labs/FLUX.1-Kontext-dev
datasets:
- handsomeWilliam/Relation252K
license: other
license_name: nvidia-license-non-commercial
license_link: LICENSE
pipeline_tag: image-to-image
LoRWeB: Spanning the Visual Analogy Space with a Weight Basis of LoRAs
Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet ${a, a', b}$, the goal is to generate $b'$ such that $a : a' :: b : b'$.
LoRWeB specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives. It introduces a learnable basis of LoRA modules to span the space of different visual transformations and a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair.
Given a prompt and an image triplet {a, a', b} that visually describe a desired transformation, LoRWeB dynamically constructs a single LoRA from a learnable basis of LoRA modules, and produces an editing result b' that applies the same analogy to the new image.
Sample Usage
To perform inference using the LoRWeB weights, use the inference.py script from the official GitHub repository:
python inference.py \
-w "path/to/lorweb_model.safetensors" \
-c "config/your_config.yaml" \
-a "data/path_to_a_img.jpg" \
-t "data/path_to_atag_img.jpg" \
-b "data/path_to_b_img.jpg" \
-o "outputs/generated_btag_img_path.jpg"
Citation
If you use this model in your research, please cite:
@article{manor2026lorweb,
title={Spanning the Visual Analogy Space with a Weight Basis of LoRAs},
author={Manor, Hila and Gal, Rinon and Maron, Haggai and Michaeli, Tomer and Chechik, Gal},
journal={arXiv preprint arXiv:2602.15727},
year={2026}
}
Acknowledgements
This project builds upon:
- FLUX.1-Kontext by Black Forest Labs
- Diffusers by Hugging Face
- PEFT by Hugging Face
- AI-Toolkit for training infrastructure