video-SALMONN 2
Collection
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. • 11 items • Updated • 1
Official model release of video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
Base model
Qwen/Qwen2-7B