Update README.md
#1
by jiangchengchengNLP - opened
README.md
CHANGED
|
@@ -8,7 +8,7 @@ base_model:
|
|
| 8 |
---
|
| 9 |
# Visual Language Model Based on Qwen and CLIP
|
| 10 |
|
| 11 |
-
This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to data size is larger
|
| 12 |
|
| 13 |
## Training Details
|
| 14 |
|
|
|
|
| 8 |
---
|
| 9 |
# Visual Language Model Based on Qwen and CLIP
|
| 10 |
|
| 11 |
+
This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to data size is larger for model, so it can only perform simple question-answering tasks on images and currently supports only English question answering.
|
| 12 |
|
| 13 |
## Training Details
|
| 14 |
|