akemiH
/

LLaVa3-Med

akemiH commited on Jun 8, 2024

Commit

c4e4674

verified ·

1 Parent(s): bd9e1a6

Create README.md

Files changed (1) hide show

README.md ADDED Viewed

+# LLaVa3-Med
+We apply 3-stages to train our model.
+1. Pretraining: We utilize a dataset comprising 600k image-text pairs from PMC and 60k medical references based on Mayo Clinic guidelines for the pretraining phase.
+2. Instruction Fine-tuning: We employ a dataset consisting of 60k LLaVA_Med instruction fine-tuning examples and PMC-VQA datasets to perform instruction learning.
+3. Fine-tuning: Our model undergoes fine-tuning on various VQA datasets.
+# Inference
+```python
+CUDA_VISIBLE_DEVICES=0 python -m evaluation \
+        --model-path model_path \
+        --question-file data_path \
+        --image-folder image_path \
+        --answers-file result.jsonl \
+        --temperature 0.7 \
+        --conv-mode llama3
+```
+# Results
+| Dataset               | Metric   | Med-Gemini | Med-PaLM-540B | LLaVa3-Med         |
+|-----------------------|----------|------------|------|----------------------|
+| Slake-VQA             | Token F1 | 87.5      | 89.3 |   89.8†         |
+| Path-VQA              | Token F1 | 64.7      | 62.7 |  64.9†          |
+Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.