VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting Paper • 2510.21817 • Published Oct 21 • 41
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published Jan 3 • 47