video-SALMONN 2 Collection video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. ⢠11 items ⢠Updated 15 days ago ⢠1
JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments Paper ⢠2602.18527 ⢠Published Feb 20 ⢠2
JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments Paper ⢠2602.18527 ⢠Published Feb 20 ⢠2
video-SALMONN 2 Collection video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. ⢠11 items ⢠Updated 15 days ago ⢠1
video-SALMONN 2 Collection video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. ⢠11 items ⢠Updated 15 days ago ⢠1