|
|
--- |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- lmms-lab/LLaVA-OneVision-Data |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-1.5B-Instruct |
|
|
pipeline_tag: image-text-to-text |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Introduction |
|
|
|
|
|
We are excited to introduce **HawkVL**, a series of multimodal large language models (MLLMs) featuring light-weight and efficiency. |
|
|
|
|
|
**Architecture**: |
|
|
- ViT: Qwen-ViT |
|
|
- Projector: 2-layer MLP with pixel unshuffle |
|
|
- LLM: Qwen2.5-1.5B |
|
|
|
|
|
|
|
|
### Evaluation |
|
|
|
|
|
We evaluate on eight benchmarks specified in the [OpenCompass](https://rank.opencompass.org.cn/leaderboard-multimodal) leaderboard using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), including: |
|
|
|
|
|
`MMBench_TEST_EN/CN_V11, MMStar, MMMU_DEV_VAL, MathVista_MINI, HallusionBench, AI2D_TEST, OCRBench, MMVet` |
|
|
|
|
|
The results are as follows: |
|
|
|
|
|
| Benchmark | HawkVL-2B | |
|
|
|------------------|-----------| |
|
|
| MMBench-TEST-avg | 64.9 | |
|
|
| MMStar | 48.2 | |
|
|
| MMMU-VAL | 43.9 | |
|
|
| MathVista_MINI | 44.1 | |
|
|
| HallusionBench | 58.5 | |
|
|
| AI2D_TEST | 67.4 | |
|
|
| OCRBench | 74.9 | |
|
|
| MMVet | 36.6 | |
|
|
| Avg | 54.8 | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## License Agreement |
|
|
|
|
|
All of our open-source models are licensed under the Apache-2.0 license. |