| | --- |
| | license: apache-2.0 |
| | tags: |
| | - MobileVLM V2 |
| | --- |
| | ## Model Summery |
| | MobileVLM V2 is a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs’ performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, MobileVLM_V2-3B model outperforms a large variety of VLMs at the 7B+ scale. |
| | |
| | The MobileVLM_V2-7B was built on [Vicuna-7B-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) to facilitate the off-the-shelf deployment. |
| |
|
| | ## Model Sources |
| | - Repository: https://github.com/Meituan-AutoML/MobileVLM |
| | - Paper: [MobileVLM V2: Faster and Stronger Baseline for Vision Language Model](https://arxiv.org/abs/2402.03766) |
| |
|
| | ## How to Get Started with the Model |
| | Inference examples can be found at [Github](https://github.com/Meituan-AutoML/MobileVLM). |
| |
|