|
|
--- |
|
|
|
|
|
title: TorToise |
|
|
emoji: 🎤 |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: docker |
|
|
app_port: 7860 |
|
|
pinned: false |
|
|
license: other |
|
|
preload_from_hub: |
|
|
- ttsds/tortoise |
|
|
--- |
|
|
|
|
|
# TorToise Text-to-Speech |
|
|
|
|
|
|
|
|
Tortoise TTS voice cloning model. |
|
|
|
|
|
|
|
|
## Features |
|
|
|
|
|
- Zero-shot voice cloning |
|
|
- Multiple language support: English |
|
|
- High-quality 24kHz audio output |
|
|
|
|
|
## Usage |
|
|
|
|
|
1. Upload a reference audio clip (3-10 seconds recommended) |
|
|
2. Enter the transcript of the reference audio |
|
|
3. Enter the text you want to synthesize |
|
|
4. Select the language |
|
|
5. Click "Synthesize" |
|
|
|
|
|
## Model Information |
|
|
|
|
|
- **Architecture**: Autoregressive, Diffusion, Language Modeling |
|
|
- **Sample Rate**: 24000 Hz |
|
|
- **Parameters**: 960M |
|
|
|
|
|
## Citation |
|
|
|
|
|
|
|
|
```bibtex |
|
|
@misc{betker2023betterspeechsynthesisscaling, |
|
|
title={Better speech synthesis through scaling}, |
|
|
author={James Betker}, |
|
|
year={2023}, |
|
|
eprint={2305.07243}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.SD}, |
|
|
url={https://arxiv.org/abs/2305.07243}, |
|
|
} |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
## Links |
|
|
|
|
|
- [Model Weights](https://huggingface.co/ttsds/tortoise) |
|
|
- [Code Repository](https://github.com/neonbjb/tortoise-tts.git) |
|
|
|
|
|
- [Paper](https://arxiv.org/abs/2305.07243) |
|
|
|
|
|
|