LoMo: Local Modality Substitution for Deeper Vision-Language Fusion Paper • 2605.30265 • Published May 28 • 23
EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs Paper • 2512.10324 • Published Dec 11, 2025 • 1
UniREditBench: A Unified Reasoning-based Image Editing Benchmark Paper • 2511.01295 • Published Nov 3, 2025 • 39