openPangu-VL-7B
中文 | English | Technical Report
1. Introduction
The openPangu-VL-7B is an efficient multimodal model based on the Ascend NPU, trained using the openPangu-Embedded-7B-V1.1 base language model and the openPangu-ViT-600M vision encoder. The openPangu-VL-7B has been trained on approximately 3T tokens and possesses capabilities such as general Visual Question Answering, chart and document understanding, object grounding and counting, video understanding, and advanced visual reasoning. This model is designed for fast thinking mode.
2. Model Architecture
| openPangu-VL-7B | |
|---|---|
| LLM | |
| Architecture | Dense |
| Parameters (Non-Embedding) | 7B |
| Number of Layers | 34 |
| Hidden Dimension | 12800 |
| Attention Mechanism | GQA |
| Number of Attention Heads | 32 for Q,8 for KV |
| Vocabulary Size | 153k |
| Context Length (Natively) | 128k |
| Vision Encoder | |
| Architecture | 22 Window Attention + 4 Full Attention |
| Number of Layers | 26 |
| Attention Hidden Size | 1536 |
| FFN Hidden Size | 4608 |
| Number of Attention Heads | 16 |
| Parameters | 615M |
3. Results
| Benchmark | openPangu-VL-7B |
|---|---|
| General VQA | |
| MMBench V1.1 DEV | 86.5 |
| MMStar | 70.1 |
| RealWorldQA | 76.1 |
| AI2D | 84.7 |
| OCR & Chart/Document Understanding | |
| OCRBench | 907 |
| TextVQA | 85.1 |
| DocVQA | 96.0 |
| ChartQA | 88.3 |
| CharXiv dq/rq | 83.9/54.3 |
| STEM | |
| MMMU | 65.2 |
| MMMU-Pro | 52.6 |
| MathVista | 75.0 |
| Object Grounding/Counting | |
| RefCOCO-avg | 90.6 |
| ODinW-13 | 51.5 |
| CountBench | 96.1 |
| Point-Bench | 65.4 |
| Multi-Image | |
| BLINK | 63.3 |
| MUIRBench | 61.6 |
| Video Understanding | |
| MVBench | 74.0 |
| VideoMME (w/o sub) | 68.0 |
| MLVU | 76.9 |
| Text-Centric Benchmark | |
| MMLU-Pro | 78.2 |
| MMLU-Redux | 87.3 |
| GPQA-Diamond | 65.2 |
| C-Eval | 83.2 |
| AIME25 | 36.5 |
| Math-500 | 89.4 |
| LiveCodeBenchV6 | 24.6 |
| MBPP+ | 68.5 |
| IFEval | 83.0 |
Note: The evaluation is conducted with vllm-ascend deploy and the system prompt remains empty. Generally, setting the minimum resolution to 2304*28*28 can yield the best evaluation results. (Except for the extremely small image OCR in OCRBench, it is recommended to set the resolution to no more than 64*28*28.) Detailed settings for different benchmarks can be found in Technical Report.
4. Deployment
vllm-ascend deploy (recommended)
vllm-ascend:please refer to [vllm_ascend_for_openpangu_vl_7b] to deploy the inference serving.
After finish deploying, you can test the api with the following script.
cd inference/vllm_ascend/examples; python quick_start.py
Direct inference
Environment:
- python==3.10
- CANN==8.1.RC1
cd inference; pip install -r requirements.txt
Inference:
cd inference; python generate.py
Model abilities
- For more examples and demomstrations of model abilities, please refer to
cookbooks.
5. Model License
Unless otherwise noted, the openPangu-VL-7B model is licensed under the terms and conditions of OPENPANGU MODEL LICENSE AGREEMENT VERSION 1.0, which is intended to be used permissively and enable the further development of artificial intelligence technologies. Please refer to the LICENSE file located in the root directory of the model repository for details.
6. Disclaimer
Due to the technical limitations inherent in the technology on which the openPangu-VL-7B (“Model”) relies and the fact that the artificial intelligence generated content is automatically produced by Model, Huawei cannot make any guarantees regarding the following matters:
- The output of this Model is automatically generated via AI algorithms, it does not rule out the possibility that some of the information may be flawed, unreasonable, or cause discomfort, and the generated content does not represent Huawei's attitude or standpoint;
- There is no guarantee that this Model is 100% accurate, reliable, functional, timely, secure and safety, error-free, uninterrupted, continuously stable, or free of any faults;
- The output of this Model does not constitute any advices or decisions for you, and it does not guarantee the authenticity, completeness, accuracy, timeliness, legality, functionality, or practicality of the generated content. The generated content cannot replace professionals in medical, legal, and other fields in answering your questions. The generated content is for your reference only and does not represent any attitude, standpoint, or position of Huawei. You need to make independent judgments based on your actual situation, and Huawei does not assume any responsibilities.
7. Contact
If you have any question, please raise an issue or contact us at openPangu@huawei.com.