See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models Paper • 2512.02231 • Published Dec 1, 2025 • 9
X-Fusion: Introducing New Modality to Frozen Large Language Models Paper • 2504.20996 • Published Apr 29, 2025 • 13
Visual Instruction Inversion: Image Editing via Visual Prompting Paper • 2307.14331 • Published Jul 26, 2023 • 1