Update README.md

fdd7432 verified 8 days ago

7.46 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	---
	# ReMatch: Boosting Representation through Matching for Multimodal Retrieval

	<p>
	<a href="https://arxiv.org/abs/2511.19278"><img src="https://img.shields.io/badge/arXiv-2511.19278-b31b1b.svg" alt="arXiv"></a>
	<a href="https://github.com/FireRedTeam/ReMatch"><img src="https://img.shields.io/badge/Code-ReMatch-green.svg" alt="Code"></a>
	<a href="https://huggingface.co/FireRedTeam/ReMatch-3B"><img src="https://img.shields.io/badge/Model-ReMatch--3B-yellow.svg" alt="Model"></a>
	</p>

	This repository contains the official implementation of ReMatch, accepted to CVPR 2026.

	ReMatch turns a multimodal large language model into a stronger multimodal retriever by adding a chat-style generative matching objective during training. The same MLLM learns to judge query-document relevance from both raw multimodal inputs and projected embeddings, complementing standard contrastive learning with instance-wise supervision on hard negatives. ReMatch also augments each input with multiple learnable representation tokens and fuses them into an efficient single-vector embedding for retrieval.

	## 👥 Authors

	[Qianying Liu](https://scholar.google.com/citations?hl=zh-TW&user=QnMV-uYAAAAJ&view_op=list_works&sortby=pubdate)\, Xiao Liang\, Zhiqiang Zhang#, Yibo Chen, Xu Tang, Zhongfei Qing, Fengfan Zhou, Yao Hu, Paul Henderson

	University of Glasgow, Xiaohongshu Inc., Huazhong University of Science and Technology

	\* Equal contribution. # Project leader.

	## 🔍 Method

	![ReMatch framework](assets/rematch_framework.png)

	ReMatch is built around two core ideas:

	- Query-Document Matching: an additional autoregressive matching stage that predicts relevance from the query, document, and their projected embeddings.
	- Learnable Multi-Token Embeddings: multiple learnable tokens capture fine-grained contextual signals; an orthogonality regularizer encourages complementary representations, and the fused output remains a standard dense embedding.

	## 🔥 News

	- 2026-05: ReMatch code, the ReMatch-3B checkpoint, and evaluation scripts are released.
	- 2026-02: ReMatch is accepted to CVPR 2026.
	- 2025-11: The ReMatch technical report is available on arXiv.

	## 🛠️ Installation

	```bash
	conda create -n rematch python=3.10 -y
	conda activate rematch
	pip install -r requirements.txt
	```

	`flash-attn` can be sensitive to CUDA, PyTorch, and compiler versions. If installation fails, install the wheel matching your environment from the official FlashAttention release instructions, then rerun the remaining dependencies.

	## 🤗 Checkpoints

	We release ReMatch-3B, a Qwen2.5-VL-3B based checkpoint trained with the ReMatch recipe:

	- [FireRedTeam/ReMatch-3B](https://huggingface.co/FireRedTeam/ReMatch-3B)

	For local checkpoints, pass the base model through `--model_name` and the adapter/full checkpoint through `--checkpoint_path` when evaluating.

	## 🚀 Training

	The public ReMatch-3B training entry point is:

	```bash
	bash experiments/public/rematch/train-rematch-itm.sh
	```

	Before training, download the mmE5 hard-negative MMEB training data from Hugging Face:

	- [intfloat/mmE5-MMEB-hardneg](https://huggingface.co/datasets/intfloat/mmE5-MMEB-hardneg)

	In addition to mmE5, please follow the original [VLM2Vec](https://github.com/TIGER-AI-Lab/VLM2Vec) data preparation instructions to download the corresponding MMEB training and evaluation data used by the public configs in this repository.

	Then edit [experiments/public/rematch/train_image_mme5_hardneg.yaml](experiments/public/rematch/train_image_mme5_hardneg.yaml) and replace every `DATASET_BASE_PATH` with the directory that contains your `mmE5/` folder. The expected layout is:

	```text
	DATASET_BASE_PATH/
	└── mmE5/
	└── mmE5-MMEB-hardneg/
	```

	The default script trains a Qwen2.5-VL-3B based ReMatch model with LoRA, 16 learnable query tokens, residual average fusion, orthogonal regularization, and the matching objective enabled. You can override common paths without editing the script:

	```bash
	EXP_DIR=/path/to/output \
	CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
	bash experiments/public/rematch/train-rematch-itm.sh
	```

	## 📊 Evaluation

	Evaluation configs are under [experiments/public/eval](experiments/public/eval):

	- [image.yaml](experiments/public/eval/image.yaml)

	Please prepare the MMEB evaluation data following the original [VLM2Vec](https://github.com/TIGER-AI-Lab/VLM2Vec) instructions, then set `DATA_BASEDIR` to the directory containing the downloaded evaluation files.

	> Note: Evaluation scores may vary slightly across environments, as different PyTorch, CUDA, and `flash-attn` versions can introduce small numerical differences.

	For checkpoints produced by this repository, we recommend using `eval_all.py`. It reads the experiment name and automatically matches the evaluation configuration used by ReMatch, including backbone type, target-side instruction prefix, chat template, learnable query tokens, and residual embedding fusion. For example, an experiment name containing `Qwen2.5vl`, `TgtInstruction`, `Queries16`, `ResidualAvg`, and `ChatTemplate` will be evaluated with the corresponding `qwen2_5_vl`, target instruction, 16 learnable tokens, average residual fusion, and chat-template settings.

	Evaluate one experiment checkpoint:

	```bash
	DATA_BASEDIR=/path/to/vlm2vec_eval \
	MODEL_BASEDIR=/path/to/training/outputs \
	OUTPUT_BASEDIR=/path/to/eval/outputs \
	MODALITIES="image" \
	python eval_all.py \
	--model_name Rematch_Qwen2.5vl_3B.image.autoresize.lora32.loraAlpha64.BS1024.IB64.GCq32p32NormTemp002.lr1e4.step3kwarm100.lrCosine.TgtInstruction.mmE5H1.Queries16.ResidualAvg.OrthTriu0.2.ChatTemplate.ITM.V1.Ratio0.1 \
	--checkpoint_name checkpoint-2200
	```

	If no arguments are provided, `eval_all.py` scans `outputs/<model_name>/<checkpoint_name>/`, evaluates every checkpoint directory, and writes summaries to:

	```text
	outputs/evals/<model_name>/<checkpoint_name>/final_results.json
	```

	For the released ReMatch-3B checkpoint, use `eval.py` directly and pass the matching ReMatch configuration explicitly:

	```bash
	torchrun --nproc_per_node=8 --master_port=2277 eval.py \
	--lora True \
	--pooling eos \
	--normalize true \
	--tgt_prefix_instruction True \
	--learnable_queries True \
	--residual_embedding True \
	--residual_embedding_method avg \
	--enable_chat_template True \
	--num_queries 16 \
	--per_device_eval_batch_size 16 \
	--model_backbone qwen2_5_vl \
	--model_name ReMatch-3B-PATH \
	--checkpoint_path ReMatch-3B-PATH \
	--dataset_config experiments/public/eval/image.yaml \
	--encode_output_path outputs/evals/ReMatch-3B/image \
	--data_basedir /path/to/MMEB
	```

	## 🙏 Acknowledgements

	This codebase is built on top of [VLM2Vec](https://github.com/TIGER-AI-Lab/VLM2Vec). We sincerely thank the VLM2Vec authors for releasing their training and evaluation infrastructure for massive multimodal embedding tasks.

	We also thank the authors of Qwen2.5-VL, MMEB, and mmE5 for their open models, benchmarks, and data resources.

	## 📚 Citation

	```bibtex
	@article{liu2025rematch,
	title={ReMatch: Boosting Representation through Matching for Multimodal Retrieval},
	author={Liu, Qianying and Liang, Xiao and Zhang, Zhiqiang and Chen, Yibo and Tang, Xu and Qing, Zhongfei and Zhou, Fengfan and Hu, Yao and Henderson, Paul},
	journal={arXiv preprint arXiv:2511.19278},
	year={2025}
	}
	```