akemiH
/

LLaVa3-Med

akemiH commited on Jun 9, 2024

Commit

de38fb7

verified ·

1 Parent(s): c4e4674

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -21,10 +21,13 @@ CUDA_VISIBLE_DEVICES=0 python -m evaluation \
 # Results
-| Dataset               | Metric   | Med-Gemini | Med-PaLM-540B | LLaVa3-Med         |
-|-----------------------|----------|------------|------|----------------------|
-| Slake-VQA             | Token F1 | 87.5      | 89.3 |   89.8†         |
-| Path-VQA              | Token F1 | 64.7      | 62.7 |  64.9†          |
 Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.

 # Results
+Because GPT-4 has not been fine-tuned on these VQA tasks, the answers it generates for open questions differ significantly in style from the reference answers. Therefore, we employed a few-shot approach and modified GPT-4's answers to match the style of the reference answers.
+| Dataset               | Metric   | Med-Gemini | Med-PaLM-540B | GPT-4V | LLaVa3-Med|
+|-----------------------|----------|------------|------|------|----------------------|
+| Slake-VQA             | Token F1 | 87.5      | 89.3 | 76.8 |   89.8†         |
+| Path-VQA              | Token F1 | 64.7      | 62.7 | 57.7 |  64.9†          |
 Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.