Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LLaVa3-Med
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
We apply 3-stages to train our model.
|
| 5 |
+
|
| 6 |
+
1. Pretraining: We utilize a dataset comprising 600k image-text pairs from PMC and 60k medical references based on Mayo Clinic guidelines for the pretraining phase.
|
| 7 |
+
2. Instruction Fine-tuning: We employ a dataset consisting of 60k LLaVA_Med instruction fine-tuning examples and PMC-VQA datasets to perform instruction learning.
|
| 8 |
+
3. Fine-tuning: Our model undergoes fine-tuning on various VQA datasets.
|
| 9 |
+
|
| 10 |
+
# Inference
|
| 11 |
+
|
| 12 |
+
```python
|
| 13 |
+
CUDA_VISIBLE_DEVICES=0 python -m evaluation \
|
| 14 |
+
--model-path model_path \
|
| 15 |
+
--question-file data_path \
|
| 16 |
+
--image-folder image_path \
|
| 17 |
+
--answers-file result.jsonl \
|
| 18 |
+
--temperature 0.7 \
|
| 19 |
+
--conv-mode llama3
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
# Results
|
| 23 |
+
|
| 24 |
+
| Dataset | Metric | Med-Gemini | Med-PaLM-540B | LLaVa3-Med |
|
| 25 |
+
|-----------------------|----------|------------|------|----------------------|
|
| 26 |
+
| Slake-VQA | Token F1 | 87.5 | 89.3 | 89.8† |
|
| 27 |
+
| Path-VQA | Token F1 | 64.7 | 62.7 | 64.9† |
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.
|