Papers
arxiv:2601.13142

TVWorld: Foundations for Remote-Control TV Agents

Published on Jan 19
Authors:
,
,
,
,
,

Abstract

TVWorld presents a graph-based TV navigation abstraction with two benchmarks that reveal limitations in existing vision-language models' topology awareness, leading to the development of TVTheseus, a specialized foundation model for TV navigation achieving superior performance.

AI-generated summary

Recent large vision-language models (LVLMs) have demonstrated strong potential for device control. However, existing research has primarily focused on point-and-click (PnC) interaction, while remote-control (RC) interaction commonly encountered in everyday TV usage remains largely underexplored. To fill this gap, we introduce TVWorld, an offline graph-based abstraction of real-world TV navigation that enables reproducible and deployment-free evaluation. On this basis, we derive two complementary benchmarks that comprehensively assess TV-use capabilities: TVWorld-N for topology-aware navigation and TVWorld-G for focus-aware grounding. These benchmarks expose a key limitation of existing agents: insufficient topology awareness for focus-based, long-horizon TV navigation. Motivated by this finding, we propose a Topology-Aware Training framework that injects topology awareness into LVLMs. Using this framework, we develop TVTheseus, a foundation model specialized for TV navigation. TVTheseus achieves a success rate of 68.3% on TVWorld-N, surpassing strong closed-source baselines such as Gemini 3 Flash and establishing state-of-the-art (SOTA) performance. Additional analyses further provide valuable insights into the development of effective TV-use agents.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.13142 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.13142 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.