maum-ai
/

Llama-3.2-MAAL-11B-Vision-v0.1

Model card Files Files and versions

lastdefiance20 commited on Dec 6, 2024

Commit

a4f31ed

·

verified ·

1 Parent(s): 74ad9fb

Update README.md

Files changed (1) hide show

README.md +4 -5

README.md CHANGED Viewed

@@ -18,8 +18,7 @@ datasets:
 <p>
 # Llama-3.2-MAAL-11B-Vision-v0.1
-We are releasing a [model](https://huggingface.co/maum-ai/Llama-3.2-MAAL-11B-Vision-v0.1), a subset of the [training dataset](https://huggingface.co/datasets/maum-ai/General-Evol-VQA), and a [leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard) to promote and accelerate the development of Korean Vision-Language Models (VLMs).
 - **Developed by:** [maum.ai Brain NLP](https://maum-ai.github.io). Jaeyoon Jung, Yoonshik Kim, Yekyung Nah
 - **Language(s) (NLP):** Korean, English (currently, bilingual)
@@ -27,7 +26,7 @@ We are releasing a [model](https://huggingface.co/maum-ai/Llama-3.2-MAAL-11B-Vis
 ## Model Description
-Version 0.1 is fine-tuned by English and Korean VQA dataset with other datasets (OCR, Math, etc)...
 - We trained this model on 8 H100-80G for 2 days with image-text pair multimodal fine-tuning dataset
 - [maum-ai/General-Evol-VQA](https://huggingface.co/datasets/maum-ai/General-Evol-VQA) is one of the datasets that we used for fine-tuning.
@@ -85,7 +84,7 @@ As the main goal of version 0.1 is **leveraging Korean VQA and OCR capabilities
 |InternVL2-8b|8.1b|32.76|
 |MiniCPM-V-2_6|8.1b|32.69|
-Our model has achieved a 20% performance improvement compared to the previous base model.
 You can check more results in [this Leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard)
-### We will release enhanced model, v0.2 soon

 <p>
 # Llama-3.2-MAAL-11B-Vision-v0.1
+**Llama-3.2-MAAL-11B-Vision-v0.1** is bilingual multimodal model trained for text and visual understanding across Korean and English languages. We are releasing a [model](https://huggingface.co/maum-ai/Llama-3.2-MAAL-11B-Vision-v0.1), a subset of the [training dataset](https://huggingface.co/datasets/maum-ai/General-Evol-VQA), and a [leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard) to promote and accelerate the development of Korean Vision-Language Models (VLMs).
 - **Developed by:** [maum.ai Brain NLP](https://maum-ai.github.io). Jaeyoon Jung, Yoonshik Kim, Yekyung Nah
 - **Language(s) (NLP):** Korean, English (currently, bilingual)
 ## Model Description
+Version 0.1 is fine-tuned by English and Korean VQA datasets with other datasets (OCR, Math, etc)...
 - We trained this model on 8 H100-80G for 2 days with image-text pair multimodal fine-tuning dataset
 - [maum-ai/General-Evol-VQA](https://huggingface.co/datasets/maum-ai/General-Evol-VQA) is one of the datasets that we used for fine-tuning.
 |InternVL2-8b|8.1b|32.76|
 |MiniCPM-V-2_6|8.1b|32.69|
+Our model has achieved a **20%** performance improvement compared to the previous base model.
 You can check more results in [this Leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard)
+### We will release enhanced model, v0.2 soon