Can VLMs Recall Factual Associations From Visual References? Paper • 2508.18297 • Published Aug 22, 2025
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning Paper • 2504.07198 • Published Apr 9, 2025
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization Paper • 2602.07054 • Published Feb 4 • 1
Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox Paper • 2605.27772 • Published 8 days ago
AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization Paper • 2602.07054 • Published Feb 4 • 1
ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions Paper • 2506.03107 • Published Jun 3, 2025 • 2
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Paper • 2504.04010 • Published Apr 5, 2025 • 9
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Paper • 2504.04010 • Published Apr 5, 2025 • 9