AURA: Always-On Understanding and Real-Time Assistance via Video Streams

A real-time multimodal streaming system powered by our AURA model, supporting continuous video understanding with speech interaction.

For demo deployment instructions and full source code, please refer to our GitHub repository.

Citation

@article{aura2026,
  title={AURA: Always-On Understanding and Real-Time Assistance via Video Streams},
  author={Lu, Xudong and Bo, Yang and Chen, Jinpeng and Li, Shuhan and Guo, Xintong and Guan, Huankang and Liu, Fang and Xu, Dunyuan and Sun, Peiwen and Sun, Heyang and Liu, Rui and Li, Hongsheng},
  journal={arXiv preprint arXiv:2604.04184},
  year={2026}
}

Downloads last month: 648

Safetensors

Model size

770k params

Tensor type

BF16

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aurateam/AURA

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(349)

this model

Paper for aurateam/AURA

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Paper • 2604.04184 • Published Apr 5 • 52