Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
Paper โข 2603.12262 โข Published โข 31
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
๐ Paper | ๐ Project Page | ๐ป Code | ๐ค Training Data
This is the 3B variant of Video Streaming Thinking (VST), a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.
| Model | OVO-Bench | StreamingBench | VideoMME | LongVideoBench | VideoHolmes |
|---|---|---|---|---|---|
| VST-3B | 56.2 | 75.5 | 59.5 | 54.1 | 36.1 |
@article{guan2026videostreamingthinking,
title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously},
author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
journal={arXiv preprint arXiv:2603.12262},
year={2026},
}