moca-embed
/

MoCa-Qwen25VL-3B

Zero-Shot Image Classification

text-generation-inference

Model card Files Files and versions

Haon-Chen commited on Jul 1, 2025

Commit

89b0e5a

·

1 Parent(s): 4361421

update README

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ pipeline_tag: zero-shot-image-classification
 This repo presents the `MoCa-Qwen25VL` series of **multimodal embedding models**.
 The model is trained based on [Qwen2.5-3B-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-VL-Instruct).
-[🏠 Homepage](https://haon-chen.github.io/MoCa/) | [💻 Code](https://github.com/haon-chen/MoCa) | [🤖 MoCa-Qwen25VL-7B](https://huggingface.co/moca-embed/MoCa-Qwen25VL-7B) | [🤖 MoCa-Qwen25VL-3B](https://huggingface.co/moca-embed/MoCa-Qwen25VL-3B) | [📚 Datasets](https://huggingface.co/moca-embed/datasets) | [📄 Paper]()
 **Highlights**
 - SOTA performance on MMEB (General Multimodal) and surpassing many strong baselines on ViDoRe-v2 (Document Retrieval).
@@ -123,10 +123,10 @@ print(string, '=', compute_similarity(qry_output, tgt_output))
 ## Citation
 If you use this model in your research, please cite the associated paper.
 ```bibtex
-@article{xxx,
   title={MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings},
   author={Chen, Haonan and Liu, Hong and Luo, Yuping and Wang, Liang and Yang, Nan and Wei, Furu and Dou, Zhicheng},
-  journal={arXiv preprint arXiv:250xxxx},
   year={2025}
 }
 ```

 This repo presents the `MoCa-Qwen25VL` series of **multimodal embedding models**.
 The model is trained based on [Qwen2.5-3B-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-VL-Instruct).
+[🏠 Homepage](https://haon-chen.github.io/MoCa/) | [💻 Code](https://github.com/haon-chen/MoCa) | [🤖 MoCa-Qwen25VL-7B](https://huggingface.co/moca-embed/MoCa-Qwen25VL-7B) | [🤖 MoCa-Qwen25VL-3B](https://huggingface.co/moca-embed/MoCa-Qwen25VL-3B) | [📚 Datasets](https://huggingface.co/moca-embed/datasets) | [📄 Paper](https://arxiv.org/abs/2506.23115)
 **Highlights**
 - SOTA performance on MMEB (General Multimodal) and surpassing many strong baselines on ViDoRe-v2 (Document Retrieval).
 ## Citation
 If you use this model in your research, please cite the associated paper.
 ```bibtex
+@article{chen2025moca,
   title={MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings},
   author={Chen, Haonan and Liu, Hong and Luo, Yuping and Wang, Liang and Yang, Nan and Wei, Furu and Dou, Zhicheng},
+  journal={arXiv preprint arXiv:2506.23115},
   year={2025}
 }
 ```