microsoft/VibeVoice-1.5B
Text-to-Speech • 3B • Updated • 237k • 2.41k
Track, rank and evaluate open LLMs and chatbots
VLMEvalKit Evaluation Results Collection
Identify objects in images using text queries
Track objects in video with custom labels