Is it limited to producing a single track?

#21

by BigDeeper - opened Aug 31, 2025

Aug 31, 2025

Not clear from the card, if it is possible to produce separate tracks for different speakers, with appropriate silences to allow "others" to speak?

YaoyaoChang

Sep 1, 2025

Currently this isn’t supported. All speakers are rendered into a single audio track, rather than separate tracks.

PsiPi

Sep 3, 2025

•

edited Sep 4, 2025

like that?

YaoyaoChang

Sep 3, 2025

Yes, all voices are currently mixed into a single track. If you’d like to separate them, we recommend using post-processing techniques such as VAD and diarization to manually split the generated audio.

PsiPi

Sep 4, 2025

•

edited Sep 4, 2025

https://github.com/paperwave/VibeVoice/pull/1/files

Single pass activation based voice separation

PsiPi

Sep 4, 2025

Currently this isn’t supported. All speakers are rendered into a single audio track, rather than separate tracks.

I noted that Layer 0/1 dimension 609 seems to be the relevant activation.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment