niobures commited on
Commit
92fa19d
·
verified ·
1 Parent(s): 9e8adec

Chatterbox TTS (ko)

Browse files
ko/.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ example.mp3 filter=lfs diff=lfs merge=lfs -text
ko/README.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ datasets:
4
+ - amphion/Emilia-Dataset
5
+ language:
6
+ - ko
7
+ base_model:
8
+ - ResembleAI/chatterbox
9
+ pipeline_tag: text-to-speech
10
+ tags:
11
+ - audio
12
+ - speech
13
+ - tts
14
+ - fine-tuning
15
+ - chatterbox
16
+ - Emilia
17
+ - voice-cloning
18
+ - zero-shot
19
+ - korean
20
+ ---
21
+
22
+ # Chatterbox TTS Korean 🌸
23
+
24
+ **Chatterbox TTS Korean** is a fine-tuned text-to-speech model specialized for the French language. The model has been trained on high-quality voice data for natural and expressive speech synthesis.
25
+
26
+ <div align="center"><img width="400px" src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/Unification_flag_of_Korea.svg/2560px-Unification_flag_of_Korea.svg.png" /></div>
27
+
28
+ - 🔊 **Language**: Korean
29
+ - 🗣️ **Training dataset**: [Emilia Dataset (KO branch)](https://huggingface.co/datasets/amphion/Emilia-Dataset)
30
+ - ⏱️ **Data quantity**: 200 hours of audio
31
+
32
+ ## Usage Example
33
+
34
+ Here’s how to generate speech using Chatterbox-TTS Korean:
35
+
36
+ ```python
37
+ import torch
38
+ import soundfile as sf
39
+ from chatterbox.tts import ChatterboxTTS
40
+ from huggingface_hub import hf_hub_download
41
+ from safetensors.torch import load_file
42
+
43
+ # Configuration
44
+ MODEL_REPO = "Thomcles/Chatterbox-TTS-Korean"
45
+ T3_FILENAME = "t3_cfg.safetensors"
46
+ TOKENIZER_FILENAME = "tokenizer_en_ko.json"
47
+ OUTPUT_PATH = "output_cloned_voice.wav"
48
+ TEXT_TO_SYNTHESIZE = "로마는 하루아침에 이루어진 것이 아니다"
49
+
50
+ def get_device() -> str:
51
+ return "cuda" if torch.cuda.is_available() else "cpu"
52
+
53
+ def download_checkpoint(repo: str, filename: str) -> str:
54
+ return hf_hub_download(repo_id=repo, filename=filename)
55
+
56
+ def load_tts_model(repo: str, checkpoint_file: str, TOKENIZER_FILENAME:str, device: str) -> ChatterboxTTS:
57
+
58
+ model = ChatterboxTTS.from_pretrained(device=device)
59
+
60
+ checkpoint_path = download_checkpoint(repo, checkpoint_file)
61
+
62
+ t3_state = load_file(checkpoint_path, device="cpu")
63
+ model.t3.load_state_dict(t3_state)
64
+ model.tokenizer = EnTokenizer(TOKENIZER_FILENAME)
65
+ model.t3.text_emb = nn.Embedding(4715+1, model.t3.dim)
66
+ model.t3.text_head = nn.Linear(model.t3.cfg.hidden_size, 4715+1, bias=False)
67
+
68
+ return model
69
+
70
+ def synthesize_speech(model: ChatterboxTTS, text: str, audio_prompt_path:str, **kwargs) -> torch.Tensor:
71
+ with torch.inference_mode():
72
+ return model.generate(
73
+ text=text,
74
+ audio_prompt_path=audio_prompt_path,
75
+ **kwargs
76
+ )
77
+
78
+ def save_audio(waveform: torch.Tensor, path: str, sample_rate: int):
79
+ sf.write(path, waveform.squeeze().cpu().numpy(), sample_rate)
80
+
81
+ def main():
82
+ print("Loading model...")
83
+ device = get_device()
84
+ model = load_tts_model(MODEL_REPO, CHECKPOINT_FILENAME, device)
85
+
86
+ print(f"Generating speech on {device}...")
87
+ wav = synthesize_speech(
88
+ model,
89
+ TEXT_TO_SYNTHESIZE,
90
+ audio_prompt_path=None
91
+ exaggeration=0.5,
92
+ temperature=0.6,
93
+ cfg_weight=0.3
94
+ )
95
+
96
+ print(f"Saving output to: {OUTPUT_PATH}")
97
+ save_audio(wav, OUTPUT_PATH, model.sr)
98
+ print("Done.")
99
+
100
+ if __name__ == "__main__":
101
+ main()
102
+ ```
103
+
104
+ Here is the output:
105
+
106
+ <audio controls src="https://huggingface.co/Thomcles/Chatterbox-TTS-Korean/resolve/main/example.mp3">Your browser does not support audio.</audio>
107
+
108
+ ### Base model license
109
+
110
+ The base model is licensed under the MIT License.
111
+ Base model: [Chatterbox](https://huggingface.co/ResembleAI/chatterbox)
112
+ License: [MIT](https://choosealicense.com/licenses/mit/)
113
+
114
+ ### Training Data License
115
+
116
+ This model was fine-tuned using a dataset licensed under Creative Commons Attribution 4.0 (CC BY 4.0).
117
+ Dataset: [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset)
118
+ License: [Creative Commons Attribution 4.0 International](https://choosealicense.com/licenses/cc-by-4.0/)
119
+
120
+
121
+ ### Contact me
122
+
123
+ Interested in fine-tuning a TTS model in a specific language or building a multilingual voice solution? Don’t hesitate to reach out.
ko/source.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ https://huggingface.co/Thomcles/Chatterbox-TTS-Korean
ko/t3_cfg.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20993487101e97960b81fda51e6dce88f7516ba7239dfebf93256db7ba8de68d
3
+ size 2162520064
ko/tokenizer_en_ko.json ADDED
The diff for this file is too large to render. See raw diff