How can we get the position of text in the generated audio?

#12

by maifeeulasad - opened Dec 8, 2025

Dec 8, 2025

It's really cool that we can now generate audio in realtime with microsoft/VibeVoice-Realtime-0.5B. I was thinking about integrating it to my application. And then I found a critical UX requirement, if we could highlight the text with the current audio that would be great.

Does vibe voice support this?

Opened an issue: https://github.com/microsoft/VibeVoice/issues/144

stonewh1

Dec 9, 2025

•

edited Dec 9, 2025

Thank you for your interest. Currently, the model cannot provide alignment information between generated speech and text.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment