Papers
arxiv:2603.03447

Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

Published on Mar 3
ยท Submitted by
taesiri
on Mar 5
Authors:
,
,
,
,
,

Abstract

Proact-VL is a multimodal framework that enables real-time interactive AI companions for gaming scenarios with low-latency responses and strong video understanding capabilities.

AI-generated summary

Proactive and real-time interactive experiences are essential for human-like AI companions, yet face three key challenges: (1) achieving low-latency inference under continuous streaming inputs, (2) autonomously deciding when to respond, and (3) controlling both quality and quantity of generated content to meet real-time constraints. In this work, we instantiate AI companions through two gaming scenarios, commentator and guide, selected for their suitability for automatic evaluation. We introduce the Live Gaming Benchmark, a large-scale dataset with three representative scenarios: solo commentary, co-commentary, and user guidance, and present Proact-VL, a general framework that shapes multimodal language models into proactive, real-time interactive agents capable of human-like environment perception and interaction. Extensive experiments show Proact-VL achieves superior response latency and quality while maintaining strong video understanding capabilities, demonstrating its practicality for real-time interactive applications.

Community

Paper submitter

Proact-VL presents a proactive, real-time VideoLLM framework enabling low-latency multimodal agents with autonomous response control, evaluated on Live Gaming Benchmark across commentary and user-guidance scenarios.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

arXivLens breakdown of this paper ๐Ÿ‘‰ https://arxivlens.com/PaperView/Details/proact-vl-a-proactive-videollm-for-real-time-ai-companions-3017-c22c5bad

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.03447 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.03447 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.03447 in a Space README.md to link it from this page.

Collections including this paper 1