Instructions to use ResembleAI/chatterbox with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Chatterbox
How to use ResembleAI/chatterbox with Chatterbox:
# pip install chatterbox-tts import torchaudio as ta from chatterbox.tts import ChatterboxTTS model = ChatterboxTTS.from_pretrained(device="cuda") text = "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill." wav = model.generate(text) ta.save("test-1.wav", wav, model.sr) # If you want to synthesize with a different voice, specify the audio prompt AUDIO_PROMPT_PATH="YOUR_FILE.wav" wav = model.generate(text, audio_prompt_path=AUDIO_PROMPT_PATH) ta.save("test-2.wav", wav, model.sr) - Inference
- Notebooks
- Google Colab
- Kaggle
Streaming?
If it's possible, how do you stream the audio? I can run it at 2x realtime on my PC.
Make a streaming engine, I've implemented it, take the base engine from RealtimeTTS from github and extend it to support this. 2x realtime is terrible for realtime though.
Make a streaming engine, I've implemented it, take the base engine from RealtimeTTS from github and extend it to support this. 2x realtime is terrible for realtime though.
I'll give it a shot! Also, what do you mean "2x realtime is terrible for realtime though"? To be clear, I'm only using this for myself...
2x realtime means a second of audio gets generated in 2 seconds. So not good for realtime, in production scenarios.
2x realtime means a second of audio gets generated in 2 seconds. So not good for realtime, in production scenarios.
Oh, actually I meant the other way around. So in my case I can generate a second of audio in 0.5 seconds.
BTW, I found someone implemented it: Github Repo so I'll close this issue