| | --- |
| | license: llama2 |
| | --- |
| | |
| | # v-MLLM Model Card |
| |
|
| | ## Model details |
| |
|
| | **Model type:** |
| | v-MLLM is an open-source MLLM trained on Visual-Modality Instruction (VIM) corpus, it can robustly follow the text-modality instructions and visual-modality instructions. |
| |
|
| | **Model date:** |
| | v-MLLM-7B was trained on January 2024. |
| |
|
| | **Github for more information:** |
| | https://github.com/VIM-Bench/VIM_TOOL |
| | |
| | ## License |
| | v-MLLM is licensed under the LLAMA 2 Community License, |
| | Copyright (c) Meta Platforms, Inc. All Rights Reserved. |
| | |
| | ## Intended use |
| | **Primary intended uses:** |
| | The primary use of v-MLLM is research on multimodal large language models. |
| | |
| | **Primary intended users:** |
| | The primary intended users of the model are researchers in computer vision, natural language processing, machine learning, and artificial intelligence. |
| | |
| | ## Training dataset |
| | - 846k VIM corpus based on LVIS-Instruct4V corpus. |
| | |
| | # Citation |
| | |
| | Please kindly cite our paper if you find our resources useful: |
| | |
| | ``` |
| | @misc{li2024text, |
| | title={Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?}, |
| | author={Xiujun Li and Yujie Lu and Zhe Gan and Jianfeng Gao and William Yang Wang and Yejin Choi}, |
| | year={2024}, |
| | eprint={2311.17647}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV} |
| | } |
| | @misc{lu2023vim, |
| | title={VIM: Probing Multimodal Large Language Models for Visual Embedded Instruction Following}, |
| | author={Yujie Lu and Xiujun Li and William Yang Wang and Yejin Choi}, |
| | year={2023}, |
| | eprint={2311.17647}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV} |
| | } |
| | ``` |