akemiH commited on
Commit
de38fb7
·
verified ·
1 Parent(s): c4e4674

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -21,10 +21,13 @@ CUDA_VISIBLE_DEVICES=0 python -m evaluation \
21
 
22
  # Results
23
 
24
- | Dataset | Metric | Med-Gemini | Med-PaLM-540B | LLaVa3-Med |
25
- |-----------------------|----------|------------|------|----------------------|
26
- | Slake-VQA | Token F1 | 87.5 | 89.3 | 89.8† |
27
- | Path-VQA | Token F1 | 64.7 | 62.7 | 64.9† |
 
 
 
28
 
29
 
30
  Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.
 
21
 
22
  # Results
23
 
24
+ Because GPT-4 has not been fine-tuned on these VQA tasks, the answers it generates for open questions differ significantly in style from the reference answers. Therefore, we employed a few-shot approach and modified GPT-4's answers to match the style of the reference answers.
25
+
26
+
27
+ | Dataset | Metric | Med-Gemini | Med-PaLM-540B | GPT-4V | LLaVa3-Med|
28
+ |-----------------------|----------|------------|------|------|----------------------|
29
+ | Slake-VQA | Token F1 | 87.5 | 89.3 | 76.8 | 89.8† |
30
+ | Path-VQA | Token F1 | 64.7 | 62.7 | 57.7 | 64.9† |
31
 
32
 
33
  Table 1 | Multimodal evaluation. Performance comparison of LLaVa3-Med versus state-of-the-art (SoTA) methods.