Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks Paper • 2602.23898 • Published 4 days ago • 7
Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning Paper • 2602.09439 • Published 21 days ago • 13
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published Sep 26, 2024 • 39