--- # Generated at 2026-01-29T20:46:31Z from templates/space/README.md.j2 title: TorToise emoji: 🎤 colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false license: other preload_from_hub: - ttsds/tortoise --- # TorToise Text-to-Speech Tortoise TTS voice cloning model. ## Features - Zero-shot voice cloning - Multiple language support: English - High-quality 24kHz audio output ## Usage 1. Upload a reference audio clip (3-10 seconds recommended) 2. Enter the transcript of the reference audio 3. Enter the text you want to synthesize 4. Select the language 5. Click "Synthesize" ## Model Information - **Architecture**: Autoregressive, Diffusion, Language Modeling - **Sample Rate**: 24000 Hz - **Parameters**: 960M ## Citation ```bibtex @misc{betker2023betterspeechsynthesisscaling, title={Better speech synthesis through scaling}, author={James Betker}, year={2023}, eprint={2305.07243}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2305.07243}, } ``` ## Links - [Model Weights](https://huggingface.co/ttsds/tortoise) - [Code Repository](https://github.com/neonbjb/tortoise-tts.git) - [Paper](https://arxiv.org/abs/2305.07243)