Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
Paper โข 2603.12262 โข Published โข 31
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
๐ Paper | ๐ Project Page | ๐ป Code | ๐ค Training Data
This is the 32B variant of Video Streaming Thinking (VST), a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.
| Model | OVO-Bench | StreamingBench | VideoMME | LongVideoBench | VideoHolmes |
|---|---|---|---|---|---|
| VST-32B | 63.5 | 80.7 | 67.2 | 60.7 | 45.1 |
@article{guan2026videostreamingthinking,
title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously},
author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
journal={arXiv preprint arXiv:2603.12262},
year={2026},
}