Visual-Riddles-Leaderboard

Running

nitzanguetta commited on Nov 27, 2024

Commit

4dd5ec2

verified ·

1 Parent(s): 318fc5f

Upload Visual-Riddles-Leaderboard.tsv

Files changed (1) hide show

Visual-Riddles-Leaderboard.tsv CHANGED Viewed

@@ -1,12 +1,14 @@
-Model	Image Captioning	Visual Question Answering	Image-Text Matching	Human Metric - Explanation of Violation	Auto Metric - Explanation of Violation	Identify - Explanation of Violation
-Humans				95		92
-Ground-truth Caption _ GPT3 (Oracle)				68	62	74
-BLIP2 FlanT5-XXL (Fine-tuned)	177	57	84	27	24	73
-BLIP2 FlanT5-XL (Fine-tuned)	174	55	81	15	18	60
-Predicted Caption _ GPT3				33	42	59
-BLIP2 FlanT5-XXL (Zero-shot)	120	55	71	0	0	50
-CLIP ViT-L/14 (Zero-shot)			70
-OFA Large (Zero-shot)	0	38
-CoCa ViT-L-14 MSCOCO (Zero-shot)	102		72
-BLIP Large (Zero-shot)	65	39	77
-BLIP2 FlanT5-XXL (Text only FT)	2	24	94

+Model	Open Ended VQA: % Human Rating	Multiple Choice VQA: % Accuracy	Hints-Multiple Choice VQA: % Accuracy 	Attributions-Multiple Choice VQA: % Accuracy 	Refernce Based-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings	Refernce Free-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings	Automatic Evaluation: % Auto-Rater Ratings	Hints-Automatic Evaluation: % Auto-Rater Ratings	Attributions-Automatic Evaluation: % Auto-Rater Ratings
+Humans	82						78
+Gemini Pro 1.5	40	38	66	72	87	52	53	62	29
+Gemini Pro Vision	30	41	62		75	38	34	47
+GPT4	34	45	69	82	86	51	38	61	25
+LlaVA-1.6-34B	15	24	30		76	43	21	16
+LlaVA-1.5-7B	13	17	29		70	35	19	30
+InstructBlip	13						20	28
+Gemini Pro 1.5 Caption _ Gemini Pro 1.5	23
+Human (Oracle) Caption _ Gemini Pro 1.5	50
+Claude 3.5 Sonnet		46	45				39
+GPT4o		55	83				50
+Qwen-VL-Max		35	53				26
+Molmo-7B		34	42				36