vec-ai
/

lychee-rerank-mm

Safetensors

qwen2_5_vl

Model card Files Files and versions

xet

Community

DDDDZQ commited on Oct 17

Commit

6d5f4f5

verified ·

1 Parent(s): a02ae08

Update README.md

Browse files

Files changed (1) hide show

README.md +23 -13

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ base_model:
 # Lychee-rerank-mm
 `Lychee-rerank-mm` is the latest generalist multimodal reranking model developed based on the `Qwen2.5-VL-Instruct` foundation model. It is designed for reranking tasks in image-text multimodal retrieval scenarios.
-`Lychee-rerank-mm` is jointly developed by the NLP Team of Harbin Institute of Technology, Shenzhen, and the 3B/7B parameter versions are released as open source.
 ![The model's framework](images/model_arch.png)
@@ -19,7 +19,7 @@ base_model:
 - Model Type: Multimodal Reranking
 - Language Support: en
-- Param Size: 3B/7B
 - Model Precision: BF16
 For more details, please refer to our paper.
@@ -29,8 +29,7 @@ For more details, please refer to our paper.
 | Model Type             | Models               | Size | Instruction Aware    |
 |------------------------|----------------------|------|----------------------|
-| Multimodal Reranking   | [lychee-rerank-mm-3B](https://huggingface.co/vec-ai/lychee-rerank-mm-3b) | 3.75B | Yes               |
-| Multimodal Reranking   | [lychee-rerank-mm-7B](https://huggingface.co/vec-ai/lychee-rerank-mm-7b) | 8.29B | Yes               |
 > **Note**:
 > - `Instruction Aware` notes whether the reranking model supports customizing the input instruction according to different tasks.
@@ -183,19 +182,30 @@ print("scores: ", scores)
 ## Evaluation
-| Model       | Param | T→T (14) | I→I (1) | T→I (4) | T→VD (5) | I→T (5) | T→IT (2) | IT→T (4) | IT→I (2) | IT→IT (3) | ALL (40) |
-|-------------|-------|----------|---------|---------|----------|---------|----------|----------|----------|-----------|----------|
-| GME-2B      | 2.21B | 49.59    | 30.75   | 48.46   | 66.39    | 52.62   | 77.02    | 39.88    | 36.70    | 66.89     | 52.54    |
 ||
-| Qwen3-Reranker     | 4.02B | 60.49| --      | --      | --       | --      | --       | --       | --       | --        | --       |
-| Jina-rerank-m0  | 2.21B | 55.36    | 27.50   | 59.46| 73.13| 55.43   | 74.95    | 27.82    | 37.65    | 51.54     | 54.36    |
-| MonoQwen2-VL-v0.1  | 2.21B | 48.89    | 12.59   | 58.73   | 71.29    | 19.62   | 76.46    | 14.35    | 31.75    | 35.83     | 44.20    |
 ||
-| **lychee-rerank-mm-3B**      | 3.75B | 59.22    | 29.76| 58.85   | 72.38    | 63.06| 81.96| 48.81| 43.97| 79.08 | 61.40|
-| **lychee-rerank-mm-7B**      | 8.29B | 61.08| 32.83| 61.18| 72.94| 66.61| 84.55| 53.29| 47.39| 82.19 | 63.85|
 For more details, please refer to our paper.
 ## Citation
-If you find our work helpful, feel free to give us a cite.(coming soon)

 # Lychee-rerank-mm
 `Lychee-rerank-mm` is the latest generalist multimodal reranking model developed based on the `Qwen2.5-VL-Instruct` foundation model. It is designed for reranking tasks in image-text multimodal retrieval scenarios.
+`Lychee-rerank-mm` is jointly developed by the NLP Team of Harbin Institute of Technology, Shenzhen, and the 7B parameter versions are released as open source.
 ![The model's framework](images/model_arch.png)
 - Model Type: Multimodal Reranking
 - Language Support: en
+- Param Size: 7B
 - Model Precision: BF16
 For more details, please refer to our paper.
 | Model Type             | Models               | Size | Instruction Aware    |
 |------------------------|----------------------|------|----------------------|
+| Multimodal Reranking   | [lychee-rerank-mm](https://huggingface.co/vec-ai/lychee-rerank-mm) | 8.29B | Yes               |
 > **Note**:
 > - `Instruction Aware` notes whether the reranking model supports customizing the input instruction according to different tasks.
 ## Evaluation
+| Model       | Param | ALL (40) | T→T (14) | I→I (1) | T→I (4) | T→VD (5) | I→T (5) | T→IT (2) | IT→T (4) | IT→I (2) | IT→IT (3) |
+|-------------|-------|----------|----------|---------|---------|----------|---------|----------|----------|----------|-----------|
+| GME-2B      | 2.21B | 52.54    | 49.59    | 30.75   | 48.46   | 66.39    | 52.62   | 77.02    | 39.88    | 36.70    | 66.89     |
 ||
+| Qwen3-Reranker     | 4.02B | --       | 60.49| --      | --      | --       | --      | --       | --       | --       | --        |
+| Jina-rerank-m0  | 2.21B | 54.36    | 55.36    | 27.50   | 59.46| 73.13| 55.43   | 74.95    | 27.82    | 37.65    | 51.54     |
+| MonoQwen2-VL-v0.1  | 2.21B | 44.20    | 48.89    | 12.59   | 58.73   | 71.29    | 19.62   | 76.46    | 14.35    | 31.75    | 35.83     |
 ||
+| **lychee-rerank-mm-3B**      | 3.75B | 61.40| 59.22    | 29.76| 58.85   | 72.38    | 63.06| 81.96| 48.81| 43.97| 79.08 |
+| **lychee-rerank-mm-7B**      | 8.29B | 63.85| 61.08| 32.83| 61.18| 72.94| 66.61| 84.55| 53.29| 47.39| 82.19 |
 For more details, please refer to our paper.
 ## Citation
+If you find our work helpful, feel free to give us a cite.
+```
+@misc{dai2025supervisedfinetuningcontrastivelearning,
+      title={Supervised Fine-Tuning or Contrastive Learning? Towards Better Multimodal LLM Reranking},
+      author={Ziqi Dai and Xin Zhang and Mingxin Li and Yanzhao Zhang and Dingkun Long and Pengjun Xie and Meishan Zhang and Wenjie Li and Min Zhang},
+      year={2025},
+      eprint={2510.14824},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2510.14824},
+}
+```