🔍 FTI4CIR: Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval (Model Weights)

Haoqiang Lin¹ Haokun Wen² Xuemeng Song¹\* Meng Liu³ Yupeng Hu¹ Liqiang Nie²

¹Shandong University ²Harbin Institute of Technology (Shenzhen) ³Shandong Jianzhu University

This repository hosts the official pre-trained model weights for FTI4CIR, a fine-grained textual inversion framework for Zero-Shot Composed Image Retrieval (CIR). The model maps reference images into subject-oriented and attribute-oriented pseudo-word tokens, enabling zero-shot composed retrieval without any annotated training triplets.

🔗 Paper: SIGIR 2024 🔗 GitHub Repository: iLearn-Lab/ERASE

📌 Model Information

1. Model Name

FTI4CIR (Fine-grained Textual Inversion for Composed Image Retrieval)

2. Task Type & Applicable Tasks

Task Type: Multimodal Retrieval / Zero-Shot Composed Image Retrieval / Textual Inversion
Applicable Tasks:
- Zero-shot composed image retrieval (reference image + modification text → target image)
- Text-image retrieval with fine-grained image decomposition
- Open-domain composed retrieval on fashion, general objects, and real-world scenes

3. Model Overview

Existing CIR methods often rely on expensive annotated <image, text, target> triplets and use only coarse-grained image representations. FTI4CIR innovatively decomposes each image into:

Subject-oriented pseudo-word token for main entities
Attribute-oriented pseudo-word tokens for appearance, style, background, etc.

The image is then represented as a natural sentence: "a photo of [S*] with [A1*, A2*, ..., Ar*]"

By concatenating with modification text, CIR is reduced to standard text-image retrieval, achieving strong zero-shot generalization.

Key designs:

Fine-grained pseudo-word token mapping
Dynamic local attribute feature extraction
Tri-wise caption-based semantic regularization (subject / attribute / whole-image)

4. Training Data

The model is trained on unlabeled open-domain images (ImageNet) without any manually annotated CIR triplets. Evaluation is performed on standard benchmarks:

FashionIQ
CIRR
CIRCO

🚀 Usage & Inference

These weights are designed to be directly used with the official FTI4CIR codebase.

Step 1: Environment Setup

git clone https://github.com/ZiChao111/FTI4CIR.git
cd FTI4CIR
conda create -n fti4cir python=3.9 -y
conda activate fti4cir
pip install -r requirements.txt

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support