Update README.md
Browse files
README.md
CHANGED
|
@@ -18,8 +18,7 @@ datasets:
|
|
| 18 |
<p>
|
| 19 |
|
| 20 |
# Llama-3.2-MAAL-11B-Vision-v0.1
|
| 21 |
-
|
| 22 |
-
We are releasing a [model](https://huggingface.co/maum-ai/Llama-3.2-MAAL-11B-Vision-v0.1), a subset of the [training dataset](https://huggingface.co/datasets/maum-ai/General-Evol-VQA), and a [leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard) to promote and accelerate the development of Korean Vision-Language Models (VLMs).
|
| 23 |
|
| 24 |
- **Developed by:** [maum.ai Brain NLP](https://maum-ai.github.io). Jaeyoon Jung, Yoonshik Kim, Yekyung Nah
|
| 25 |
- **Language(s) (NLP):** Korean, English (currently, bilingual)
|
|
@@ -27,7 +26,7 @@ We are releasing a [model](https://huggingface.co/maum-ai/Llama-3.2-MAAL-11B-Vis
|
|
| 27 |
|
| 28 |
## Model Description
|
| 29 |
|
| 30 |
-
Version 0.1 is fine-tuned by English and Korean VQA
|
| 31 |
|
| 32 |
- We trained this model on 8 H100-80G for 2 days with image-text pair multimodal fine-tuning dataset
|
| 33 |
- [maum-ai/General-Evol-VQA](https://huggingface.co/datasets/maum-ai/General-Evol-VQA) is one of the datasets that we used for fine-tuning.
|
|
@@ -85,7 +84,7 @@ As the main goal of version 0.1 is **leveraging Korean VQA and OCR capabilities
|
|
| 85 |
|InternVL2-8b|8.1b|32.76|
|
| 86 |
|MiniCPM-V-2_6|8.1b|32.69|
|
| 87 |
|
| 88 |
-
Our model has achieved a 20
|
| 89 |
You can check more results in [this Leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard)
|
| 90 |
|
| 91 |
-
### We will release enhanced model, v0.2 soon
|
|
|
|
| 18 |
<p>
|
| 19 |
|
| 20 |
# Llama-3.2-MAAL-11B-Vision-v0.1
|
| 21 |
+
**Llama-3.2-MAAL-11B-Vision-v0.1** is bilingual multimodal model trained for text and visual understanding across Korean and English languages. We are releasing a [model](https://huggingface.co/maum-ai/Llama-3.2-MAAL-11B-Vision-v0.1), a subset of the [training dataset](https://huggingface.co/datasets/maum-ai/General-Evol-VQA), and a [leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard) to promote and accelerate the development of Korean Vision-Language Models (VLMs).
|
|
|
|
| 22 |
|
| 23 |
- **Developed by:** [maum.ai Brain NLP](https://maum-ai.github.io). Jaeyoon Jung, Yoonshik Kim, Yekyung Nah
|
| 24 |
- **Language(s) (NLP):** Korean, English (currently, bilingual)
|
|
|
|
| 26 |
|
| 27 |
## Model Description
|
| 28 |
|
| 29 |
+
Version 0.1 is fine-tuned by English and Korean VQA datasets with other datasets (OCR, Math, etc)...
|
| 30 |
|
| 31 |
- We trained this model on 8 H100-80G for 2 days with image-text pair multimodal fine-tuning dataset
|
| 32 |
- [maum-ai/General-Evol-VQA](https://huggingface.co/datasets/maum-ai/General-Evol-VQA) is one of the datasets that we used for fine-tuning.
|
|
|
|
| 84 |
|InternVL2-8b|8.1b|32.76|
|
| 85 |
|MiniCPM-V-2_6|8.1b|32.69|
|
| 86 |
|
| 87 |
+
Our model has achieved a **20%** performance improvement compared to the previous base model.
|
| 88 |
You can check more results in [this Leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard)
|
| 89 |
|
| 90 |
+
### We will release enhanced model, v0.2 soon
|