AdaptLLM
/

Adapt-MLLM-to-Domains

Model card Files Files and versions

AdaptLLM commited on Dec 5, 2024

Commit

be3c08f

·

verified ·

1 Parent(s): 898e89b

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -6,6 +6,9 @@ language:
 This repository provides an implementation preview of our paper: [On Domain-Specific Post-Training for Multimodal Large Language Models](https://huggingface.co/papers/2411.19930).
 We investigate domain adaptation of MLLMs through post-training, focusing on data synthesis, training pipelines, and task evaluation.
 **(1) Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. **Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.**
 **(2) Training Pipeline**: While the two-stage training--initially on image-caption pairs followed by visual instruction tasks--is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.

 This repository provides an implementation preview of our paper: [On Domain-Specific Post-Training for Multimodal Large Language Models](https://huggingface.co/papers/2411.19930).
+Our code will be available at [https://github.com/bigai-ai/QA-Synthesizer](https://github.com/bigai-ai/QA-Synthesizer)
 We investigate domain adaptation of MLLMs through post-training, focusing on data synthesis, training pipelines, and task evaluation.
 **(1) Data Synthesis**: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. **Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs.**
 **(2) Training Pipeline**: While the two-stage training--initially on image-caption pairs followed by visual instruction tasks--is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training.