Text-to-Speech
Transformers
Safetensors
English
vibevoice_streaming
Realtime TTS
Streaming text input
Long-form speech generation
Instructions to use microsoft/VibeVoice-Realtime-0.5B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/VibeVoice-Realtime-0.5B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="microsoft/VibeVoice-Realtime-0.5B")# Load model directly from transformers import VibeVoiceStreamingForConditionalGenerationInference model = VibeVoiceStreamingForConditionalGenerationInference.from_pretrained("microsoft/VibeVoice-Realtime-0.5B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Tried to use this to generate chinese, sounds very foreign...
#8
by id0o0bi - opened
Is there any plan to train this model to be multilingual?
THE FUTURE of TTS model is to be Real-time streaming on consumer devices
Great Job!
The current model primarily supports English and may also handle other languages such as German, Spanish, Portuguese, Japanese, and Korean. However, we have not conducted systematic testing on non-English languages. We are working on training with Chinese data to provide a version that supports Chinese in the future.
One issue maybe caused by the reason that the pre defined voice fonts are English native, if there is a native Chinese voice embedding, maybe the result will become much better