Audio-Visual Intelligence in Large Foundation Models Paper ⢠2605.04045 ⢠Published 8 days ago ⢠30
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models Paper ⢠2412.12932 ⢠Published Dec 17, 2024 ⢠2
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining Paper ⢠2412.10342 ⢠Published Dec 13, 2024
Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark Paper ⢠2502.04976 ⢠Published Feb 7, 2025
Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology Paper ⢠2503.14911 ⢠Published Mar 19, 2025 ⢠3
Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning Paper ⢠2602.00971 ⢠Published Feb 28
UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark Paper ⢠2603.05075 ⢠Published Mar 5 ⢠1
SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models Paper ⢠2604.12617 ⢠Published 29 days ago ⢠6
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment Paper ⢠2604.19548 ⢠Published 22 days ago ⢠16
Audio-Visual Intelligence in Large Foundation Models Paper ⢠2605.04045 ⢠Published 8 days ago ⢠30
Audio-Visual Intelligence in Large Foundation Models Paper ⢠2605.04045 ⢠Published 8 days ago ⢠30
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction Paper ⢠2604.27221 ⢠Published 14 days ago ⢠38
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment Paper ⢠2604.19548 ⢠Published 22 days ago ⢠16
Reasoning Implicit Sentiment with Chain-of-Thought Prompting Paper ⢠2305.11255 ⢠Published May 18, 2023 ⢠2
CMNER: A Chinese Multimodal NER Dataset based on Social Media Paper ⢠2402.13693 ⢠Published Feb 21, 2024
PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis Paper ⢠2408.09481 ⢠Published Aug 18, 2024 ⢠1
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model Paper ⢠2304.06248 ⢠Published Apr 13, 2023
NUS-Emo at SemEval-2024 Task 3: Instruction-Tuning LLM for Multimodal Emotion-Cause Analysis in Conversations Paper ⢠2501.17261 ⢠Published Aug 22, 2024
On Path to Multimodal Generalist: General-Level and General-Bench Paper ⢠2505.04620 ⢠Published May 7, 2025 ⢠83