Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ base_model:
|
|
| 10 |
# Lychee-rerank-mm
|
| 11 |
|
| 12 |
`Lychee-rerank-mm` is the latest generalist multimodal reranking model developed based on the `Qwen2.5-VL-Instruct` foundation model. It is designed for reranking tasks in image-text multimodal retrieval scenarios.
|
| 13 |
-
`Lychee-rerank-mm` is jointly developed by the NLP Team of Harbin Institute of Technology, Shenzhen, and the
|
| 14 |
|
| 15 |

|
| 16 |
|
|
@@ -19,7 +19,7 @@ base_model:
|
|
| 19 |
|
| 20 |
- Model Type: Multimodal Reranking
|
| 21 |
- Language Support: en
|
| 22 |
-
- Param Size:
|
| 23 |
- Model Precision: BF16
|
| 24 |
|
| 25 |
For more details, please refer to our paper.
|
|
@@ -29,8 +29,7 @@ For more details, please refer to our paper.
|
|
| 29 |
|
| 30 |
| Model Type | Models | Size | Instruction Aware |
|
| 31 |
|------------------------|----------------------|------|----------------------|
|
| 32 |
-
| Multimodal Reranking | [lychee-rerank-mm
|
| 33 |
-
| Multimodal Reranking | [lychee-rerank-mm-7B](https://huggingface.co/vec-ai/lychee-rerank-mm-7b) | 8.29B | Yes |
|
| 34 |
|
| 35 |
> **Note**:
|
| 36 |
> - `Instruction Aware` notes whether the reranking model supports customizing the input instruction according to different tasks.
|
|
@@ -183,19 +182,30 @@ print("scores: ", scores)
|
|
| 183 |
|
| 184 |
## Evaluation
|
| 185 |
|
| 186 |
-
| Model | Param | T→T (14) | I→I (1) | T→I (4) | T→VD (5) | I→T (5) | T→IT (2) | IT→T (4) | IT→I (2) | IT→IT (3) |
|
| 187 |
-
|
| 188 |
-
| GME-2B | 2.21B | 49.59 | 30.75 | 48.46 | 66.39 | 52.62 | 77.02 | 39.88 | 36.70 | 66.89 |
|
| 189 |
||
|
| 190 |
-
| Qwen3-Reranker | 4.02B | 60.49| -- | -- | -- | -- | -- | -- | -- | -- |
|
| 191 |
-
| Jina-rerank-m0 | 2.21B | 55.36 | 27.50 | 59.46| 73.13| 55.43 | 74.95 | 27.82 | 37.65 | 51.54 |
|
| 192 |
-
| MonoQwen2-VL-v0.1 | 2.21B | 48.89 | 12.59 | 58.73 | 71.29 | 19.62 | 76.46 | 14.35 | 31.75 | 35.83 |
|
| 193 |
||
|
| 194 |
-
| **lychee-rerank-mm-3B** | 3.75B | 59.22 | 29.76| 58.85 | 72.38 | 63.06| 81.96| 48.81| 43.97| 79.08 |
|
| 195 |
-
| **lychee-rerank-mm-7B** | 8.29B | 61.08| 32.83| 61.18| 72.94| 66.61| 84.55| 53.29| 47.39| 82.19 |
|
| 196 |
|
| 197 |
For more details, please refer to our paper.
|
| 198 |
|
| 199 |
## Citation
|
| 200 |
|
| 201 |
-
If you find our work helpful, feel free to give us a cite.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
# Lychee-rerank-mm
|
| 11 |
|
| 12 |
`Lychee-rerank-mm` is the latest generalist multimodal reranking model developed based on the `Qwen2.5-VL-Instruct` foundation model. It is designed for reranking tasks in image-text multimodal retrieval scenarios.
|
| 13 |
+
`Lychee-rerank-mm` is jointly developed by the NLP Team of Harbin Institute of Technology, Shenzhen, and the 7B parameter versions are released as open source.
|
| 14 |
|
| 15 |

|
| 16 |
|
|
|
|
| 19 |
|
| 20 |
- Model Type: Multimodal Reranking
|
| 21 |
- Language Support: en
|
| 22 |
+
- Param Size: 7B
|
| 23 |
- Model Precision: BF16
|
| 24 |
|
| 25 |
For more details, please refer to our paper.
|
|
|
|
| 29 |
|
| 30 |
| Model Type | Models | Size | Instruction Aware |
|
| 31 |
|------------------------|----------------------|------|----------------------|
|
| 32 |
+
| Multimodal Reranking | [lychee-rerank-mm](https://huggingface.co/vec-ai/lychee-rerank-mm) | 8.29B | Yes |
|
|
|
|
| 33 |
|
| 34 |
> **Note**:
|
| 35 |
> - `Instruction Aware` notes whether the reranking model supports customizing the input instruction according to different tasks.
|
|
|
|
| 182 |
|
| 183 |
## Evaluation
|
| 184 |
|
| 185 |
+
| Model | Param | ALL (40) | T→T (14) | I→I (1) | T→I (4) | T→VD (5) | I→T (5) | T→IT (2) | IT→T (4) | IT→I (2) | IT→IT (3) |
|
| 186 |
+
|-------------|-------|----------|----------|---------|---------|----------|---------|----------|----------|----------|-----------|
|
| 187 |
+
| GME-2B | 2.21B | 52.54 | 49.59 | 30.75 | 48.46 | 66.39 | 52.62 | 77.02 | 39.88 | 36.70 | 66.89 |
|
| 188 |
||
|
| 189 |
+
| Qwen3-Reranker | 4.02B | -- | 60.49| -- | -- | -- | -- | -- | -- | -- | -- |
|
| 190 |
+
| Jina-rerank-m0 | 2.21B | 54.36 | 55.36 | 27.50 | 59.46| 73.13| 55.43 | 74.95 | 27.82 | 37.65 | 51.54 |
|
| 191 |
+
| MonoQwen2-VL-v0.1 | 2.21B | 44.20 | 48.89 | 12.59 | 58.73 | 71.29 | 19.62 | 76.46 | 14.35 | 31.75 | 35.83 |
|
| 192 |
||
|
| 193 |
+
| **lychee-rerank-mm-3B** | 3.75B | 61.40| 59.22 | 29.76| 58.85 | 72.38 | 63.06| 81.96| 48.81| 43.97| 79.08 |
|
| 194 |
+
| **lychee-rerank-mm-7B** | 8.29B | 63.85| 61.08| 32.83| 61.18| 72.94| 66.61| 84.55| 53.29| 47.39| 82.19 |
|
| 195 |
|
| 196 |
For more details, please refer to our paper.
|
| 197 |
|
| 198 |
## Citation
|
| 199 |
|
| 200 |
+
If you find our work helpful, feel free to give us a cite.
|
| 201 |
+
```
|
| 202 |
+
@misc{dai2025supervisedfinetuningcontrastivelearning,
|
| 203 |
+
title={Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking},
|
| 204 |
+
author={Ziqi Dai and Xin Zhang and Mingxin Li and Yanzhao Zhang and Dingkun Long and Pengjun Xie and Meishan Zhang and Wenjie Li and Min Zhang},
|
| 205 |
+
year={2025},
|
| 206 |
+
eprint={2510.14824},
|
| 207 |
+
archivePrefix={arXiv},
|
| 208 |
+
primaryClass={cs.CL},
|
| 209 |
+
url={https://arxiv.org/abs/2510.14824},
|
| 210 |
+
}
|
| 211 |
+
```
|