πŸ” FTI4CIR: Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval (Model Weights)

Haoqiang Lin1  Haokun Wen2  Xuemeng Song1\*  Meng Liu3  Yupeng Hu1  Liqiang Nie2

1Shandong University   2Harbin Institute of Technology (Shenzhen)   3Shandong Jianzhu University

This repository hosts the official pre-trained model weights for FTI4CIR, a fine-grained textual inversion framework for Zero-Shot Composed Image Retrieval (CIR). The model maps reference images into subject-oriented and attribute-oriented pseudo-word tokens, enabling zero-shot composed retrieval without any annotated training triplets.

πŸ”— Paper: SIGIR 2024 πŸ”— GitHub Repository: iLearn-Lab/ERASE


πŸ“Œ Model Information

1. Model Name

FTI4CIR (Fine-grained Textual Inversion for Composed Image Retrieval)

2. Task Type & Applicable Tasks

  • Task Type: Multimodal Retrieval / Zero-Shot Composed Image Retrieval / Textual Inversion
  • Applicable Tasks:
    • Zero-shot composed image retrieval (reference image + modification text β†’ target image)
    • Text-image retrieval with fine-grained image decomposition
    • Open-domain composed retrieval on fashion, general objects, and real-world scenes

3. Model Overview

Existing CIR methods often rely on expensive annotated <image, text, target> triplets and use only coarse-grained image representations. FTI4CIR innovatively decomposes each image into:

  • Subject-oriented pseudo-word token for main entities
  • Attribute-oriented pseudo-word tokens for appearance, style, background, etc.

The image is then represented as a natural sentence: "a photo of [S*] with [A1*, A2*, ..., Ar*]"

By concatenating with modification text, CIR is reduced to standard text-image retrieval, achieving strong zero-shot generalization.

Key designs:

  • Fine-grained pseudo-word token mapping
  • Dynamic local attribute feature extraction
  • Tri-wise caption-based semantic regularization (subject / attribute / whole-image)

4. Training Data

The model is trained on unlabeled open-domain images (ImageNet) without any manually annotated CIR triplets. Evaluation is performed on standard benchmarks:

  • FashionIQ
  • CIRR
  • CIRCO

πŸš€ Usage & Inference

These weights are designed to be directly used with the official FTI4CIR codebase.

Step 1: Environment Setup

git clone https://github.com/ZiChao111/FTI4CIR.git
cd FTI4CIR
conda create -n fti4cir python=3.9 -y
conda activate fti4cir
pip install -r requirements.txt
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support