Upload Visual-Riddles-Leaderboard.tsv
Browse files- Visual-Riddles-Leaderboard.tsv +14 -12
Visual-Riddles-Leaderboard.tsv
CHANGED
|
@@ -1,12 +1,14 @@
|
|
| 1 |
-
Model
|
| 2 |
-
Humans
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
| 1 |
+
Model Open Ended VQA: % Human Rating Multiple Choice VQA: % Accuracy Hints-Multiple Choice VQA: % Accuracy Attributions-Multiple Choice VQA: % Accuracy Refernce Based-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings Refernce Free-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings Automatic Evaluation: % Auto-Rater Ratings Hints-Automatic Evaluation: % Auto-Rater Ratings Attributions-Automatic Evaluation: % Auto-Rater Ratings
|
| 2 |
+
Humans 82 78
|
| 3 |
+
Gemini Pro 1.5 40 38 66 72 87 52 53 62 29
|
| 4 |
+
Gemini Pro Vision 30 41 62 75 38 34 47
|
| 5 |
+
GPT4 34 45 69 82 86 51 38 61 25
|
| 6 |
+
LlaVA-1.6-34B 15 24 30 76 43 21 16
|
| 7 |
+
LlaVA-1.5-7B 13 17 29 70 35 19 30
|
| 8 |
+
InstructBlip 13 20 28
|
| 9 |
+
Gemini Pro 1.5 Caption _ Gemini Pro 1.5 23
|
| 10 |
+
Human (Oracle) Caption _ Gemini Pro 1.5 50
|
| 11 |
+
Claude 3.5 Sonnet 46 45 39
|
| 12 |
+
GPT4o 55 83 50
|
| 13 |
+
Qwen-VL-Max 35 53 26
|
| 14 |
+
Molmo-7B 34 42 36
|