--- license: apache-2.0 language: - en - zh --- # ReMatch: Boosting Representation through Matching for Multimodal Retrieval
This repository contains the official implementation of **ReMatch**, accepted to **CVPR 2026**. ReMatch turns a multimodal large language model into a stronger multimodal retriever by adding a chat-style generative matching objective during training. The same MLLM learns to judge query-document relevance from both raw multimodal inputs and projected embeddings, complementing standard contrastive learning with instance-wise supervision on hard negatives. ReMatch also augments each input with multiple learnable representation tokens and fuses them into an efficient single-vector embedding for retrieval. ## 👥 Authors [Qianying Liu](https://scholar.google.com/citations?hl=zh-TW&user=QnMV-uYAAAAJ&view_op=list_works&sortby=pubdate)\*, Xiao Liang\*, Zhiqiang Zhang#, Yibo Chen, Xu Tang, Zhongfei Qing, Fengfan Zhou, Yao Hu, Paul Henderson University of Glasgow, Xiaohongshu Inc., Huazhong University of Science and Technology \* Equal contribution. # Project leader. ## 🔍 Method  ReMatch is built around two core ideas: - **Query-Document Matching**: an additional autoregressive matching stage that predicts relevance from the query, document, and their projected embeddings. - **Learnable Multi-Token Embeddings**: multiple learnable tokens capture fine-grained contextual signals; an orthogonality regularizer encourages complementary representations, and the fused output remains a standard dense embedding. ## 🔥 News - **2026-05**: ReMatch code, the **ReMatch-3B** checkpoint, and evaluation scripts are released. - **2026-02**: ReMatch is accepted to **CVPR 2026**. - **2025-11**: The ReMatch technical report is available on arXiv. ## 🛠️ Installation ```bash conda create -n rematch python=3.10 -y conda activate rematch pip install -r requirements.txt ``` `flash-attn` can be sensitive to CUDA, PyTorch, and compiler versions. If installation fails, install the wheel matching your environment from the official FlashAttention release instructions, then rerun the remaining dependencies. ## 🤗 Checkpoints We release **ReMatch-3B**, a Qwen2.5-VL-3B based checkpoint trained with the ReMatch recipe: - [FireRedTeam/ReMatch-3B](https://huggingface.co/FireRedTeam/ReMatch-3B) For local checkpoints, pass the base model through `--model_name` and the adapter/full checkpoint through `--checkpoint_path` when evaluating. ## 🚀 Training The public ReMatch-3B training entry point is: ```bash bash experiments/public/rematch/train-rematch-itm.sh ``` Before training, download the mmE5 hard-negative MMEB training data from Hugging Face: - [intfloat/mmE5-MMEB-hardneg](https://huggingface.co/datasets/intfloat/mmE5-MMEB-hardneg) In addition to mmE5, please follow the original [VLM2Vec](https://github.com/TIGER-AI-Lab/VLM2Vec) data preparation instructions to download the corresponding MMEB training and evaluation data used by the public configs in this repository. Then edit [experiments/public/rematch/train_image_mme5_hardneg.yaml](experiments/public/rematch/train_image_mme5_hardneg.yaml) and replace every `DATASET_BASE_PATH` with the directory that contains your `mmE5/` folder. The expected layout is: ```text DATASET_BASE_PATH/ └── mmE5/ └── mmE5-MMEB-hardneg/ ``` The default script trains a Qwen2.5-VL-3B based ReMatch model with LoRA, 16 learnable query tokens, residual average fusion, orthogonal regularization, and the matching objective enabled. You can override common paths without editing the script: ```bash EXP_DIR=/path/to/output \ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ bash experiments/public/rematch/train-rematch-itm.sh ``` ## 📊 Evaluation Evaluation configs are under [experiments/public/eval](experiments/public/eval): - [image.yaml](experiments/public/eval/image.yaml) Please prepare the MMEB evaluation data following the original [VLM2Vec](https://github.com/TIGER-AI-Lab/VLM2Vec) instructions, then set `DATA_BASEDIR` to the directory containing the downloaded evaluation files. > **Note:** Evaluation scores may vary slightly across environments, as different PyTorch, CUDA, and `flash-attn` versions can introduce small numerical differences. For checkpoints produced by this repository, we recommend using `eval_all.py`. It reads the experiment name and automatically matches the evaluation configuration used by ReMatch, including backbone type, target-side instruction prefix, chat template, learnable query tokens, and residual embedding fusion. For example, an experiment name containing `Qwen2.5vl`, `TgtInstruction`, `Queries16`, `ResidualAvg`, and `ChatTemplate` will be evaluated with the corresponding `qwen2_5_vl`, target instruction, 16 learnable tokens, average residual fusion, and chat-template settings. Evaluate one experiment checkpoint: ```bash DATA_BASEDIR=/path/to/vlm2vec_eval \ MODEL_BASEDIR=/path/to/training/outputs \ OUTPUT_BASEDIR=/path/to/eval/outputs \ MODALITIES="image" \ python eval_all.py \ --model_name Rematch_Qwen2.5vl_3B.image.autoresize.lora32.loraAlpha64.BS1024.IB64.GCq32p32NormTemp002.lr1e4.step3kwarm100.lrCosine.TgtInstruction.mmE5H1.Queries16.ResidualAvg.OrthTriu0.2.ChatTemplate.ITM.V1.Ratio0.1 \ --checkpoint_name checkpoint-2200 ``` If no arguments are provided, `eval_all.py` scans `outputs/