| #### Overview | |
| BioMed-VITAL is a multimodal foundation model specifically tuned for biomedical applications. It leverages visual and textual data to improve understanding and reasoning within the biomedical domain. | |
| #### Model Training | |
| The training of BioMed-VITAL involved two key stages, both incorporating clinician preferences to ensure the relevance and quality of the training data: | |
| 1. **Data Generation:** During this stage, the GPT-4V generator was prompted with a diverse set of clinician-selected demonstrations. This approach facilitated the generation of domain-specific, preference-aligned data candidates, tailored to reflect real-world clinical scenarios and preferences. | |
| 2. **Data Selection:** A separate selection model was trained to explicitly incorporate clinician and policy-guided preferences. This model employed a sophisticated rating function to evaluate and select the highest quality data for further tuning of BioMed-VITAL. This selection process was critical in refining the dataset to ensure that only the most relevant and accurate instructional data was used. | |
| #### Performance and Evaluation | |
| The effectiveness of BioMed-VITAL was demonstrated through significant improvements in two key areas: | |
| - **Open Visual Chat:** The model showed a relative improvement of 18.5%, indicating enhanced capabilities in engaging in visual dialogues pertinent to biomedical contexts. | |
| - **Medical Visual Question Answering (VQA):** BioMed-VITAL achieved a win rate of up to 81.73% in this domain, showcasing its superior performance in interpreting and responding to complex medical imagery and queries. |