Papers
arxiv:2604.04184

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Published on Apr 5
· Submitted by
Jinpeng Chen
on Apr 7
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

AURA is an end-to-end streaming visual interaction framework that enables continuous video stream processing with real-time question answering and proactive responses through integrated context management and optimized deployment.

AI-generated summary

Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress, yet current approaches often rely on decoupled trigger-response pipelines or are limited to captioning-style narration, reducing their effectiveness for open-ended question answering and long-horizon interaction. We propose AURA (Always-On Understanding and Real-Time Assistance), an end-to-end streaming visual interaction framework that enables a unified VideoLLM to continuously process video streams and support both real-time question answering and proactive responses. AURA integrates context management, data construction, training objectives, and deployment optimization for stable long-horizon streaming interaction. It achieves state-of-the-art performance on streaming benchmarks and supports a real-time demo system with ASR and TTS running at 2 FPS on two 80G accelerators. We release the AURA model together with a real-time inference framework to facilitate future research.

Community

Paper submitter

🔥 We've open-sourced the model weights and demo code. Feel free to try them out!

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.04184
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.04184 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.04184 in a Space README.md to link it from this page.

Collections including this paper 2