Spaces:
Running
Running
| title: "LudoBench: Board Game Reasoning Benchmark" | |
| emoji: "\U0001F3B2" | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: static | |
| pinned: false | |
| license: mit | |
| # LudoBench | |
| A multimodal board-game reasoning benchmark evaluating LLM/VLM reasoning across 5 strategy games and 3 difficulty tiers. | |
| - 638 annotated QA pairs | |
| - 5 games: Kingdomino, Res Arcana, Pax Renaissance, Carcassonne, Catan | |
| - 3 tiers: Environment Perception, Rules Integration, Short-Horizon Optimization | |
| - 9 models benchmarked across 3 modalities (None, Text, Image) | |