---
# Generated at 2026-01-29T20:46:31Z from templates/space/README.md.j2
title: TorToise
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: other
preload_from_hub:
  - ttsds/tortoise
---

# TorToise Text-to-Speech


Tortoise TTS voice cloning model.


## Features

- Zero-shot voice cloning
- Multiple language support: English
- High-quality 24kHz audio output

## Usage

1. Upload a reference audio clip (3-10 seconds recommended)
2. Enter the transcript of the reference audio
3. Enter the text you want to synthesize
4. Select the language
5. Click "Synthesize"

## Model Information

- **Architecture**: Autoregressive, Diffusion, Language Modeling
- **Sample Rate**: 24000 Hz
- **Parameters**: 960M

## Citation


```bibtex
@misc{betker2023betterspeechsynthesisscaling,
  title={Better speech synthesis through scaling},
  author={James Betker},
  year={2023},
  eprint={2305.07243},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  url={https://arxiv.org/abs/2305.07243},
}

```


## Links

- [Model Weights](https://huggingface.co/ttsds/tortoise)
- [Code Repository](https://github.com/neonbjb/tortoise-tts.git)

- [Paper](https://arxiv.org/abs/2305.07243)