Failing to Explore: Language Models on Interactive Tasks
Abstract
Language models exhibit limited exploration capabilities in interactive environments, with performance improvements achieved through budget allocation strategies and historical summarization techniques.
We evaluate language models on their ability to explore interactive environments under a limited interaction budget. We introduce three parametric tasks with controllable exploration difficulty, spanning continuous and discrete environments. Across state-of-the-art models, we find systematic under-exploration and suboptimal solutions, with performance often significantly worse than simple explore--exploit heuristic baselines and scaling weakly as the budget increases. Finally, we study two lightweight interventions: splitting a fixed budget into parallel executions, which surprisingly improves performance despite a no-gain theoretical result for our tasks, and periodically summarizing the interaction history, which preserves key discoveries and further improves exploration.
Community
LLMs fail to explore.
arXivLens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/failing-to-explore-language-models-on-interactive-tasks-6658-5ba055fc
- Executive Summary
- Detailed Breakdown
- Practical Applications
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper