ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder Paper • 2510.18795 • Published Oct 21 • 10
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue Paper • 2510.13747 • Published Oct 15 • 29