NASK-PIB
/

LLaVA-PLLuM-12b-nc-instruct

@@ -139,8 +139,8 @@ To create high-quality Polish multimodal data from English sources, a rigorous t
 ### Summary
 The model demonstrates a significant advancement in Polish multimodal capabilities:
-* **MMBench-PL:** Achieved **73.89%**, marking a **+5.6% improvement** over LLaVA-1.6-Vicuna-13B, while maintaining comparable English performance.
-* **Captioning Quality:** consistently preferred by the LLM judge over open-source competitors (95.2% win-rate vs. PaliGemma-3B, 62.7% vs. Qwen2.5-VL-7B).
 * **Qualitative Analysis:** The model shows superior handling of Polish grammar/morphology and correctly identifies Polish cultural elements (e.g., specific landmarks like the Palace of Culture and Science, regional food like Toruń gingerbread) where generic models often fail.
 ## Societal Impact Assessment
@@ -167,27 +167,6 @@ The model demonstrates a significant advancement in Polish multimodal capabiliti
 We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Centers: ACK Cyfronet AGH) for providing computer facilities and support within computational grant no. PLG/2025/018129
-# Citation
-If you use this model, please cite the following paper:
-```bibtex
-@inproceedings{statkiewicz2026annotation,
-  title     = {Annotation-Efficient Vision-Language Model Adaptation to the Polish Language Using the LLaVA Framework},
-  author    = {Statkiewicz, Grzegorz and
-               Dobrzeniecka, Alicja and
-               Seweryn, Karolina and
-               Krasnod{\k e}bska, Aleksandra and
-               Piosek, Karolina and
-               Bogusz, Katarzyna and
-               Cygert, Sebastian and
-               Kusa, Wojciech},
-  booktitle = {Proceedings of the Student Workshop at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026)},
-  year      = {2026},
-  publisher = {Association for Computational Linguistics}
-}
-```
 # Model Card Contact
 For questions or contributions, please reach out via: nlp@nask.pl
@@ -281,3 +260,25 @@ output = llm.generate(
 print(output[0].outputs[0].text)
 ```

 ### Summary
 The model demonstrates a significant advancement in Polish multimodal capabilities:
+* **MMBench-PL:** Achieved **79.35%**, marking a **+9.55% improvement** over LLaVA-1.6-Vicuna-13B, while maintaining comparable English performance.
+* **Captioning Quality:** Achieved better performance than PaliGemma-3B (65.28% win-rate vs. PaliGemma-3B), slightly outperforms LLaVA-1.6-Mistral-7B and LLaVA-1.6-Vicuna-13B, and shows competitive results—though slightly lower-compared to Qwen2.5-VL-7B and Pixtral-12B.
 * **Qualitative Analysis:** The model shows superior handling of Polish grammar/morphology and correctly identifies Polish cultural elements (e.g., specific landmarks like the Palace of Culture and Science, regional food like Toruń gingerbread) where generic models often fail.
 ## Societal Impact Assessment
 We gratefully acknowledge Polish high-performance computing infrastructure PLGrid (HPC Centers: ACK Cyfronet AGH) for providing computer facilities and support within computational grant no. PLG/2025/018129
 # Model Card Contact
 For questions or contributions, please reach out via: nlp@nask.pl
 print(output[0].outputs[0].text)
 ```
+# Citation
+If you use this model, please cite the following paper:
+```bibtex
+@inproceedings{statkiewicz2026annotation,
+  title     = {Annotation-Efficient Vision-Language Model Adaptation to the Polish Language Using the LLaVA Framework},
+  author    = {Statkiewicz, Grzegorz and
+               Dobrzeniecka, Alicja and
+               Seweryn, Karolina and
+               Krasnod{\k e}bska, Aleksandra and
+               Piosek, Karolina and
+               Bogusz, Katarzyna and
+               Cygert, Sebastian and
+               Kusa, Wojciech},
+  booktitle = {Proceedings of the Student Workshop at the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026)},
+  year      = {2026},
+  publisher = {Association for Computational Linguistics}
+}
+```