Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeDemystifying the oracle: A "20 Questions" game to promote AI ethics and literacy
As Generative AI becomes a key component in physics education, a significant ethical challenge has emerged: the tendency of students to anthropomorphize Large Language Models (LLMs), treating them as authoritative "oracles" that retrieve fixed facts from an internal database. However, LLMs operate fundamentally as probabilistic engines. This paper describes the design and implementation of a didactic activity, a reduced version of the "20 Questions" game, aimed at making this stochastic nature directly observable. Unlike a human player who fixes a target object at the start of the game, students discover that the model generates answers based solely on local coherence with the interaction history. By utilizing functionalities such as re-sampling and history rewinding, students act as experimenters, observing how identical interaction histories can yield diverging narrative paths. We discuss how mapping these behaviors to familiar physics concepts provides the epistemic scaffolding necessary to promote informed skepticism, framing the verification of AI outputs not merely as a compliance rule, but as a technical necessity derived from the system's probabilistic nature.
Can Large Vision Language Models Read Maps Like a Human?
In this paper, we introduce MapBench-the first dataset specifically designed for human-readable, pixel-based map-based outdoor navigation, curated from complex path finding scenarios. MapBench comprises over 1600 pixel space map path finding problems from 100 diverse maps. In MapBench, LVLMs generate language-based navigation instructions given a map image and a query with beginning and end landmarks. For each map, MapBench provides Map Space Scene Graph (MSSG) as an indexing data structure to convert between natural language and evaluate LVLM-generated results. We demonstrate that MapBench significantly challenges state-of-the-art LVLMs both zero-shot prompting and a Chain-of-Thought (CoT) augmented reasoning framework that decomposes map navigation into sequential cognitive processes. Our evaluation of both open-source and closed-source LVLMs underscores the substantial difficulty posed by MapBench, revealing critical limitations in their spatial reasoning and structured decision-making capabilities. We release all the code and dataset in https://github.com/taco-group/MapBench.
Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs
Recently, Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge for situations that can not be pre-programmed beforehand. Generally speaking, mobile robots need to understand maps to execute tasks such as localization or navigation. In this letter, we address the problem of enabling LLMs to comprehend Area Graph, a text-based map representation, in order to enhance their applicability in the field of mobile robotics. Area Graph is a hierarchical, topometric semantic map representation utilizing polygons to demark areas such as rooms, corridors or buildings. In contrast to commonly used map representations, such as occupancy grid maps or point clouds, osmAG (Area Graph in OpensStreetMap format) is stored in a XML textual format naturally readable by LLMs. Furthermore, conventional robotic algorithms such as localization and path planning are compatible with osmAG, facilitating this map representation comprehensible by LLMs, traditional robotic algorithms and humans. Our experiments show that with a proper map representation, LLMs possess the capability to understand maps and answer queries based on that understanding. Following simple fine-tuning of LLaMA2 models, it surpassed ChatGPT-3.5 in tasks involving topology and hierarchy understanding. Our dataset, dataset generation code, fine-tuned LoRA adapters can be accessed at https://github.com/xiefujing/LLM-osmAG-Comprehension.
