File size: 1,212 Bytes
9b93ac8
c4b1cb5
ca200bf
 
 
 
 
 
9b93ac8
ca200bf
c4b1cb5
 
9b93ac8
 
ca200bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
# Generated at 2026-01-29T20:46:31Z from templates/space/README.md.j2
title: TorToise
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: other
preload_from_hub:
  - ttsds/tortoise
---

# TorToise Text-to-Speech


Tortoise TTS voice cloning model.


## Features

- Zero-shot voice cloning
- Multiple language support: English
- High-quality 24kHz audio output

## Usage

1. Upload a reference audio clip (3-10 seconds recommended)
2. Enter the transcript of the reference audio
3. Enter the text you want to synthesize
4. Select the language
5. Click "Synthesize"

## Model Information

- **Architecture**: Autoregressive, Diffusion, Language Modeling
- **Sample Rate**: 24000 Hz
- **Parameters**: 960M

## Citation


```bibtex
@misc{betker2023betterspeechsynthesisscaling,
  title={Better speech synthesis through scaling},
  author={James Betker},
  year={2023},
  eprint={2305.07243},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  url={https://arxiv.org/abs/2305.07243},
}

```


## Links

- [Model Weights](https://huggingface.co/ttsds/tortoise)
- [Code Repository](https://github.com/neonbjb/tortoise-tts.git)

- [Paper](https://arxiv.org/abs/2305.07243)