EASI / leaderboard.md
Hrant's picture
Update leaderboard via Leaderboarder
6f644d3 verified

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
model_name score VSI [66] SITE [57] MMSI [68] OmniSpatial [23] MindCube ∗ [69] STARE [32] CoreCognition [33] SpatialViz [55] source_title source_url notes
Human 79.2 79.2 67.5 97.2 92.63 94.55 96.5 86.98 82.46 EASI https://arxiv.org/pdf/2508.13142.pdf None
Qwen3-8B-Instruct [65] 57.9 57.9 45.83 31.1 45.73 29.42 39.76 69.67 17.54 † EASI https://arxiv.org/pdf/2508.13142.pdf † indicates cases where generations were truncated due to overlong chains of thought, yielding no final answer; such instances are counted as incorrect, which depresses the score.
InternVL3.5-8B [56] 56.05 56.05 43.79 27.3 46.71 42.5 40.18 66.4 23.98 EASI https://arxiv.org/pdf/2508.13142.pdf None
GPT-5-2025-08-07 [45] 55.03 55.03 61.88 41.8 59.9 56.3 54.59 84.37 51.27 EASI https://arxiv.org/pdf/2508.13142.pdf None
Gemini-2.5-pro-2025-06 [52] 53.57 53.57 57.06 38 55.38 57.6 49.14 76.7 42.71 EASI https://arxiv.org/pdf/2508.13142.pdf None
Seed-1.6-2025-06-15 [51] 49.91 49.91 54.61 38.3 49.32 48.75 46.06 77.17 34.58 EASI https://arxiv.org/pdf/2508.13142.pdf None
GPT-5-mini-2025-08-07 [45] 48.67 48.67 52.47 34.1 55.52 56.69 52.51 77.77 44.66 EASI https://arxiv.org/pdf/2508.13142.pdf None
Grok-4-2025-07-09 [62] 47.92 47.92 47.01 37.8 46.84 63.56 26.9 79.27 19.40 † EASI https://arxiv.org/pdf/2508.13142.pdf † indicates cases where generations were truncated due to overlong chains of thought, yielding no final answer; such instances are counted as incorrect, which depresses the score.
InternVL3-78B [79] 47.55 47.55 52.72 30.5 50.95 49.52 42 71.16 31.10 EASI https://arxiv.org/pdf/2508.13142.pdf None
GPT-5-nano-2025-08-07 [45] 43.22 43.22 35.81 28.9 47.81 41.48 46.05 67.92 35.59 EASI https://arxiv.org/pdf/2508.13142.pdf None
InternVL3-8B [79] 42.14 42.14 41.15 28 46.25 41.54 41.36 60.92 30.00 EASI https://arxiv.org/pdf/2508.13142.pdf None
Qwen2.5-VL-72B-Instruct [1] 35.77 35.77 47.41 32.5 47.81 42.4 38.37 69.22 32.54 EASI https://arxiv.org/pdf/2508.13142.pdf None
Random Choice 34 34 0 25 24.98 32.35 34.8 33.93 25.08 EASI https://arxiv.org/pdf/2508.13142.pdf VSI random choice here is chance level(Frequency).
Qwen2.5-VL-7B-Instruct [1] 32.3 32.3 37.64 26.8 39.07 36.05 35.03 62.16 26.78 EASI https://arxiv.org/pdf/2508.13142.pdf None
Qwen2.5-VL-3B-Instruct [1] 27 27 33.14 28.6 42.47 37.6 37.83 60.19 21.86 EASI https://arxiv.org/pdf/2508.13142.pdf None