EASI / leaderboard.csv
Hrant's picture
Update leaderboard via Leaderboarder
6f644d3 verified
model_name,score,VSI [66],SITE [57],MMSI [68],OmniSpatial [23],MindCube ∗ [69],STARE [32],CoreCognition [33],SpatialViz [55],source_title,source_url,notes
Human,79.2,79.2,67.5,97.2,92.63,94.55,96.50,86.98,82.46,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
Qwen3-8B-Instruct [65],57.9,57.90,45.83,31.10,45.73,29.42,39.76,69.67,17.54 †,EASI,https://arxiv.org/pdf/2508.13142.pdf,"† indicates cases where generations were truncated due to overlong chains of thought, yielding no final answer; such instances are counted as incorrect, which depresses the score."
InternVL3.5-8B [56],56.05,56.05,43.79,27.30,46.71,42.50,40.18,66.40,23.98,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
GPT-5-2025-08-07 [45],55.03,55.03,61.88,41.80,59.90,56.30,54.59,84.37,51.27,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
Gemini-2.5-pro-2025-06 [52],53.57,53.57,57.06,38.00,55.38,57.60,49.14,76.70,42.71,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
Seed-1.6-2025-06-15 [51],49.91,49.91,54.61,38.30,49.32,48.75,46.06,77.17,34.58,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
GPT-5-mini-2025-08-07 [45],48.67,48.67,52.47,34.10,55.52,56.69,52.51,77.77,44.66,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
Grok-4-2025-07-09 [62],47.92,47.92,47.01,37.80,46.84,63.56,26.90,79.27,19.40 †,EASI,https://arxiv.org/pdf/2508.13142.pdf,"† indicates cases where generations were truncated due to overlong chains of thought, yielding no final answer; such instances are counted as incorrect, which depresses the score."
InternVL3-78B [79],47.55,47.55,52.72,30.50,50.95,49.52,42.00,71.16,31.10,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
GPT-5-nano-2025-08-07 [45],43.22,43.22,35.81,28.90,47.81,41.48,46.05,67.92,35.59,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
InternVL3-8B [79],42.14,42.14,41.15,28.00,46.25,41.54,41.36,60.92,30.00,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
Qwen2.5-VL-72B-Instruct [1],35.77,35.77,47.41,32.50,47.81,42.40,38.37,69.22,32.54,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
Random Choice,34.0,34.00,0.0,25.00,24.98,32.35,34.80,33.93,25.08,EASI,https://arxiv.org/pdf/2508.13142.pdf,VSI random choice here is chance level(Frequency).
Qwen2.5-VL-7B-Instruct [1],32.3,32.30,37.64,26.80,39.07,36.05,35.03,62.16,26.78,EASI,https://arxiv.org/pdf/2508.13142.pdf,None
Qwen2.5-VL-3B-Instruct [1],27.0,27.00,33.14,28.60,42.47,37.60,37.83,60.19,21.86,EASI,https://arxiv.org/pdf/2508.13142.pdf,None