Text-to-Speech
Transformers
Safetensors
English
vibevoice_streaming
Realtime TTS
Streaming text input
Long-form speech generation
Instructions to use microsoft/VibeVoice-Realtime-0.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/VibeVoice-Realtime-0.5B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="microsoft/VibeVoice-Realtime-0.5B")# Load model directly from transformers import VibeVoiceStreamingForConditionalGenerationInference model = VibeVoiceStreamingForConditionalGenerationInference.from_pretrained("microsoft/VibeVoice-Realtime-0.5B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Terrible Quality!
#16
by qpqpqpqpqpqp - opened
I tried all VibeVoice models, they all generate speech with disgusting noise! It is not so hard to run a denoiser to make clean audio files and then train on them to make a better, high-quality model like some did. Are you deaf, excuse me
Could you share the generated speech? If you used the WebSocket realtime demo, one possible reason is that the device’s inference capability couldn’t keep up with the speech playback, which resulted in noticeable noise.