video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions.
AI & ML interests
https://www.ee.tsinghua.edu.cn/en/
Recent Activity
View all activity
https://www.ee.tsinghua.edu.cn/en/