update figures and README

Browse files

Files changed (6) hide show

.gitattributes +4 -0
README.md +9 -5
figures/global_pool.png +3 -0
figures/local_pool.png +3 -0
figures/zero_shot_image.png +3 -0
figures/zero_shot_video.png +3 -0

.gitattributes CHANGED Viewed

@@ -37,3 +37,7 @@ figures/figure1.png filter=lfs diff=lfs merge=lfs -text
 figures/figure2.png filter=lfs diff=lfs merge=lfs -text
 figures/figure3.png filter=lfs diff=lfs merge=lfs -text
 figures/figure4.png filter=lfs diff=lfs merge=lfs -text

 figures/figure2.png filter=lfs diff=lfs merge=lfs -text
 figures/figure3.png filter=lfs diff=lfs merge=lfs -text
 figures/figure4.png filter=lfs diff=lfs merge=lfs -text
+figures/global_pool.png filter=lfs diff=lfs merge=lfs -text
+figures/local_pool.png filter=lfs diff=lfs merge=lfs -text
+figures/zero_shot_image.png filter=lfs diff=lfs merge=lfs -text
+figures/zero_shot_video.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -26,11 +26,13 @@ Building on these insights, we introduce U-MARVEL (Universal Multimodal Retrieva
 ├── checkpoints
 │   ├── hf_models
 │   │   └── Qwen2-VL-7B-Instruct
 │   └── U-MARVEL-Qwen2VL-7B-Instruct
 ```
 - [U-MARVEL-Qwen2VL-7B-Instruct](https://huggingface.co/TencentBAC/U-MARVEL-Qwen2VL-7B-Instruct) 🤗
-- [Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct)
 - Inference code available at: [U-MARVEL-inference](https://github.com/chaxjli/U-MARVEL)
 ## Requirements
@@ -44,6 +46,8 @@ pip install -r requirements.txt
 Download Qwen2-VL-7B and place it in `./checkpoints/hf_models/Qwen2-VL-7B-Instruct`
 For NLI dataset, please refer to [link](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse)
 For multimodal instruction tuning datset, please refer to [M-BEIR](https://huggingface.co/datasets/TIGER-Lab/M-BEIR)
@@ -84,13 +88,13 @@ sh scripts/eval_zeroshot.sh                # Evaluate zero-shot
 The proposed U-MARVEL framework establishes new state-of-the-art performance across both
 single-model architectures and recall-then-rerank approaches on M-BEIR benchmark.
-<img src="./figures/figure1.png" alt="M-BEIR-Local" width="700" height="auto">
-<img src="./figures/figure2.png" alt="M-BEIR-Global" width="700" height="auto">
-<img src="./figures/figure3.png" alt="M-BEIR-Zero-shot" width="700" height="auto">
-<img src="./figures/figure4.png" alt="M-BEIR-Zero-shot" width="700" height="auto">
 ## Acknowledgements

 ├── checkpoints
 │   ├── hf_models
 │   │   └── Qwen2-VL-7B-Instruct
+│   │   └── Qwen3-VL-4B-Instruct
 │   └── U-MARVEL-Qwen2VL-7B-Instruct
+│   └── U-MARVEL-Qwen3VL-4B-Instruct
 ```
 - [U-MARVEL-Qwen2VL-7B-Instruct](https://huggingface.co/TencentBAC/U-MARVEL-Qwen2VL-7B-Instruct) 🤗
+- [U-MARVEL-Qwen3VL-4B-Instruct](https://huggingface.co/TencentBAC/U-MARVEL-Qwen3VL-4B-Instruct) 🤗
 - Inference code available at: [U-MARVEL-inference](https://github.com/chaxjli/U-MARVEL)
 ## Requirements
 Download Qwen2-VL-7B and place it in `./checkpoints/hf_models/Qwen2-VL-7B-Instruct`
+Download Qwen3-VL-4B and place it in `./checkpoints/hf_models/Qwen3-VL-4B-Instruct`
 For NLI dataset, please refer to [link](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse)
 For multimodal instruction tuning datset, please refer to [M-BEIR](https://huggingface.co/datasets/TIGER-Lab/M-BEIR)
 The proposed U-MARVEL framework establishes new state-of-the-art performance across both
 single-model architectures and recall-then-rerank approaches on M-BEIR benchmark.
+<img src="./figures/local_pool.png" alt="M-BEIR-Local" width="700" height="auto">
+<img src="./figures/global_pool.png" alt="M-BEIR-Global" width="700" height="auto">
+<img src="./figures/zero_shot_image.png" alt="M-BEIR-Zero-shot_image" width="700" height="auto">
+<img src="./figures/zero_shot_video.png" alt="M-BEIR-Zero-shot_video" width="700" height="auto">
 ## Acknowledgements

figures/global_pool.png ADDED Viewed

Git LFS Details

SHA256: 874e20812c2cf30356b21038fad7c69624d236d09f851ac7d78193799f63f851
Pointer size: 131 Bytes
Size of remote file: 249 kB

figures/local_pool.png ADDED Viewed

Git LFS Details

SHA256: 82f173b1c69a5354579e2ab3573274279a95d24ef969df7cd2c3ae2ab8e1930f
Pointer size: 131 Bytes
Size of remote file: 261 kB

figures/zero_shot_image.png ADDED Viewed

Git LFS Details

SHA256: 92b2ed30bba7e219bec916cbec903563a60f594bccfbc3446bb3737bc17a7e10
Pointer size: 131 Bytes
Size of remote file: 270 kB

figures/zero_shot_video.png ADDED Viewed

Git LFS Details

SHA256: f7e5b410d5012ad60f557892bbaa85210803bb6cd523398f2ab805187b58ee03
Pointer size: 131 Bytes
Size of remote file: 214 kB