cooper_robot
commited on
Commit
·
48eeeff
1
Parent(s):
fd151db
Add release note for v1.1.0
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ LLaVA-OneVision is a multimodal vision-language model that integrates a pretrain
|
|
| 8 |
|
| 9 |
Original paper: [LLaVA-OneVision: Easy Visual Task Transfer](https://arxiv.org/abs/2408.03326)
|
| 10 |
|
| 11 |
-
#LLaVA-OneVision-Qwen2-7B
|
| 12 |
|
| 13 |
This model uses LLaVA-OneVision with Qwen-2 as the language backbone, allowing rich multimodal reasoning and generation capabilities. It is well suited for applications such as image-grounded question answering, multimodal dialogue, and tasks requiring aligned understanding of visual and textual information.
|
| 14 |
|
|
|
|
| 8 |
|
| 9 |
Original paper: [LLaVA-OneVision: Easy Visual Task Transfer](https://arxiv.org/abs/2408.03326)
|
| 10 |
|
| 11 |
+
# LLaVA-OneVision-Qwen2-7B
|
| 12 |
|
| 13 |
This model uses LLaVA-OneVision with Qwen-2 as the language backbone, allowing rich multimodal reasoning and generation capabilities. It is well suited for applications such as image-grounded question answering, multimodal dialogue, and tasks requiring aligned understanding of visual and textual information.
|
| 14 |
|