HawkVL-2B / README.md
xjtupanda's picture
Update README.md
9f71276 verified
---
license: apache-2.0
datasets:
- lmms-lab/LLaVA-OneVision-Data
language:
- en
- zh
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
pipeline_tag: image-text-to-text
library_name: transformers
---
# Introduction
We are excited to introduce **HawkVL**, a series of multimodal large language models (MLLMs) featuring light-weight and efficiency.
**Architecture**:
- ViT: Qwen-ViT
- Projector: 2-layer MLP with pixel unshuffle
- LLM: Qwen2.5-1.5B
### Evaluation
We evaluate on eight benchmarks specified in the [OpenCompass](https://rank.opencompass.org.cn/leaderboard-multimodal) leaderboard using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), including:
`MMBench_TEST_EN/CN_V11, MMStar, MMMU_DEV_VAL, MathVista_MINI, HallusionBench, AI2D_TEST, OCRBench, MMVet`
The results are as follows:
| Benchmark | HawkVL-2B |
|------------------|-----------|
| MMBench-TEST-avg | 64.9 |
| MMStar | 48.2 |
| MMMU-VAL | 43.9 |
| MathVista_MINI | 44.1 |
| HallusionBench | 58.5 |
| AI2D_TEST | 67.4 |
| OCRBench | 74.9 |
| MMVet | 36.6 |
| Avg | 54.8 |
## License Agreement
All of our open-source models are licensed under the Apache-2.0 license.