Update README.md
Browse files
README.md
CHANGED
|
@@ -21,10 +21,13 @@ CUDA_VISIBLE_DEVICES=0 python -m evaluation \
|
|
| 21 |
|
| 22 |
# Results
|
| 23 |
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
|
| 30 |
Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.
|
|
|
|
| 21 |
|
| 22 |
# Results
|
| 23 |
|
| 24 |
+
Because GPT-4 has not been fine-tuned on these VQA tasks, the answers it generates for open questions differ significantly in style from the reference answers. Therefore, we employed a few-shot approach and modified GPT-4's answers to match the style of the reference answers.
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
| Dataset | Metric | Med-Gemini | Med-PaLM-540B | GPT-4V | LLaVa3-Med|
|
| 28 |
+
|-----------------------|----------|------------|------|------|----------------------|
|
| 29 |
+
| Slake-VQA | Token F1 | 87.5 | 89.3 | 76.8 | 89.8† |
|
| 30 |
+
| Path-VQA | Token F1 | 64.7 | 62.7 | 57.7 | 64.9† |
|
| 31 |
|
| 32 |
|
| 33 |
Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.
|