MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 9 items • Updated 2 days ago • 22
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4, 2024 • 97
view article Article PaliGemma – Google's Cutting-Edge Open Vision Language Model +1 May 14, 2024 • 278