metadata
title: TorToise
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: other
preload_from_hub:
- ttsds/tortoise
TorToise Text-to-Speech
Tortoise TTS voice cloning model.
Features
- Zero-shot voice cloning
- Multiple language support: English
- High-quality 24kHz audio output
Usage
- Upload a reference audio clip (3-10 seconds recommended)
- Enter the transcript of the reference audio
- Enter the text you want to synthesize
- Select the language
- Click "Synthesize"
Model Information
- Architecture: Autoregressive, Diffusion, Language Modeling
- Sample Rate: 24000 Hz
- Parameters: 960M
Citation
@misc{betker2023betterspeechsynthesisscaling,
title={Better speech synthesis through scaling},
author={James Betker},
year={2023},
eprint={2305.07243},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2305.07243},
}