How can we access the acoustic encoder and semantics encoder?

#20

by hebangwen - opened Dec 17, 2025

Dec 17, 2025

Acoustic encoder and semantics encoder is available in VibeVoice-1.5B. However, these two encoders are missing in VibeVoice-Realtime-0.5B. The predefined voice is encoded as kv-cache. Can we clone voice in zero-shot if we have these two models?

sailorjs0804

Dec 23, 2025

same question, we try to reproduce acoustic encoder, but only perform well in first 6 seconds

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment