π FTI4CIR: Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval (Model Weights)
Haoqiang Lin1 Haokun Wen2 Xuemeng Song1\* Meng Liu3 Yupeng Hu1 Liqiang Nie2
1Shandong University 2Harbin Institute of Technology (Shenzhen) 3Shandong Jianzhu University
This repository hosts the official pre-trained model weights for FTI4CIR, a fine-grained textual inversion framework for Zero-Shot Composed Image Retrieval (CIR). The model maps reference images into subject-oriented and attribute-oriented pseudo-word tokens, enabling zero-shot composed retrieval without any annotated training triplets.
π Paper: SIGIR 2024 π GitHub Repository: iLearn-Lab/ERASE
π Model Information
1. Model Name
FTI4CIR (Fine-grained Textual Inversion for Composed Image Retrieval)
2. Task Type & Applicable Tasks
- Task Type: Multimodal Retrieval / Zero-Shot Composed Image Retrieval / Textual Inversion
- Applicable Tasks:
- Zero-shot composed image retrieval (reference image + modification text β target image)
- Text-image retrieval with fine-grained image decomposition
- Open-domain composed retrieval on fashion, general objects, and real-world scenes
3. Model Overview
Existing CIR methods often rely on expensive annotated <image, text, target> triplets and use only coarse-grained image representations.
FTI4CIR innovatively decomposes each image into:
- Subject-oriented pseudo-word token for main entities
- Attribute-oriented pseudo-word tokens for appearance, style, background, etc.
The image is then represented as a natural sentence:
"a photo of [S*] with [A1*, A2*, ..., Ar*]"
By concatenating with modification text, CIR is reduced to standard text-image retrieval, achieving strong zero-shot generalization.
Key designs:
- Fine-grained pseudo-word token mapping
- Dynamic local attribute feature extraction
- Tri-wise caption-based semantic regularization (subject / attribute / whole-image)
4. Training Data
The model is trained on unlabeled open-domain images (ImageNet) without any manually annotated CIR triplets. Evaluation is performed on standard benchmarks:
- FashionIQ
- CIRR
- CIRCO
π Usage & Inference
These weights are designed to be directly used with the official FTI4CIR codebase.
Step 1: Environment Setup
git clone https://github.com/ZiChao111/FTI4CIR.git
cd FTI4CIR
conda create -n fti4cir python=3.9 -y
conda activate fti4cir
pip install -r requirements.txt