VST-32B

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

๐Ÿ“„ Paper | ๐ŸŒ Project Page | ๐Ÿ’ป Code | ๐Ÿค— Training Data

This is the 32B variant of Video Streaming Thinking (VST), a new paradigm for streaming video understanding that interleaves active reasoning with continuous video consumption, enabling amortized test-time scaling with real-time responsiveness.

Performance

Model OVO-Bench StreamingBench VideoMME LongVideoBench VideoHolmes
VST-32B 63.5 80.7 67.2 60.7 45.1

Citation

@article{guan2026videostreamingthinking,
      title={Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously},
      author={Yiran Guan and Liang Yin and Dingkang Liang and Jianzhong Ju and Zhenbo Luo and Jian Luan and Yuliang Liu and Xiang Bai},
      journal={arXiv preprint arXiv:2603.12262},
      year={2026},
}
Downloads last month
36
Safetensors
Model size
33B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Catalan258/VST-32B

Finetuned
(67)
this model
Quantizations
1 model

Paper for Catalan258/VST-32B