TurkishCodeMan commited on
Commit
49faf94
·
verified ·
1 Parent(s): b756053

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +135 -0
README.md ADDED
@@ -0,0 +1,135 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - text-to-speech
7
+ - tts
8
+ - xtts
9
+ - voice-cloning
10
+ - coqui
11
+ library_name: coqui-tts
12
+ pipeline_tag: text-to-speech
13
+ ---
14
+
15
+ # XTTS v2 Fine-tuned Model (English)
16
+
17
+ This is a fine-tuned version of [Coqui XTTS v2](https://github.com/coqui-ai/TTS) for English text-to-speech synthesis.
18
+
19
+ ## Model Description
20
+
21
+ - **Base Model:** XTTS v2
22
+ - **Language:** English
23
+ - **Training Data:** Custom English speech dataset (~14 minutes)
24
+ - **Training Epochs:** 10
25
+ - **Best Checkpoint:** Epoch 7 (lowest eval loss: 3.07)
26
+
27
+ ## Training Details
28
+
29
+ | Parameter | Value |
30
+ |-----------|-------|
31
+ | Batch Size | 4 |
32
+ | Learning Rate | 5e-06 |
33
+ | Max Audio Length | 11 seconds |
34
+ | Total Training Samples | 168 |
35
+
36
+ ### Loss Progression
37
+
38
+ | Epoch | Eval Loss |
39
+ |-------|-----------|
40
+ | 0 | 3.36 |
41
+ | 1 | 3.23 |
42
+ | 2 | 3.17 |
43
+ | 3 | 3.12 |
44
+ | 4 | 3.10 |
45
+ | 5 | 3.08 |
46
+ | 6 | 3.07 |
47
+ | 7 | **3.07** (best) |
48
+ | 8 | 3.11 |
49
+ | 9 | 3.10 |
50
+
51
+ ## Usage
52
+
53
+ ### Installation
54
+
55
+ ```bash
56
+ pip install TTS==0.22.0 torch==2.5.1 torchaudio==2.5.1 transformers==4.40.0
57
+ pip install huggingface_hub
58
+ ```
59
+
60
+ ### Quick Start
61
+
62
+ ```python
63
+ import os
64
+ import torch
65
+ import torchaudio
66
+ from huggingface_hub import hf_hub_download
67
+ from TTS.tts.configs.xtts_config import XttsConfig
68
+ from TTS.tts.models.xtts import Xtts
69
+
70
+ # Download model files
71
+ repo_id = "TurkishCodeMan/xtts-v2-english-finetuned"
72
+ model_path = hf_hub_download(repo_id=repo_id, filename="model.pth")
73
+ config_path = hf_hub_download(repo_id=repo_id, filename="config.json")
74
+ vocab_path = hf_hub_download(repo_id=repo_id, filename="vocab.json")
75
+
76
+ # Load model
77
+ config = XttsConfig()
78
+ config.load_json(config_path)
79
+
80
+ model = Xtts.init_from_config(config)
81
+ model.load_checkpoint(
82
+ config,
83
+ checkpoint_dir=os.path.dirname(model_path),
84
+ checkpoint_path=model_path,
85
+ vocab_path=vocab_path,
86
+ use_deepspeed=False
87
+ )
88
+ model.cuda()
89
+
90
+ # Generate speech (download a sample reference audio first)
91
+ ref_audio = hf_hub_download(repo_id=repo_id, filename="samples/speaker_reference.wav")
92
+ gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(audio_path=ref_audio)
93
+
94
+ out = model.inference(
95
+ text="Hello, this is a test of the fine-tuned XTTS model.",
96
+ language="en",
97
+ gpt_cond_latent=gpt_cond_latent,
98
+ speaker_embedding=speaker_embedding,
99
+ )
100
+
101
+ wav = torch.tensor(out["wav"]).unsqueeze(0)
102
+ torchaudio.save("output.wav", wav, 24000)
103
+ ```
104
+
105
+ ## Audio Samples
106
+
107
+ | Type | File |
108
+ |------|------|
109
+ | Speaker Reference | [speaker_reference.wav](samples/speaker_reference.wav) |
110
+ | Generated Output | [generated_output.wav](samples/generated_output.wav) |
111
+
112
+ ## Requirements
113
+
114
+ ⚠️ **Important:** Use specific versions to avoid compatibility issues.
115
+
116
+ - Python 3.10+
117
+ - PyTorch 2.5.1
118
+ - torchaudio 2.5.1 (NOT 2.9.1+)
119
+ - transformers 4.40.0 (NOT 4.50+)
120
+ - TTS 0.22.0
121
+
122
+ ## Known Issues & Solutions
123
+
124
+ 1. **StopIteration error in trainer:** Patch `trainer/generic_utils.py` or use monkey-patch before importing TTS.
125
+ 2. **Multi-GPU error:** Set `CUDA_VISIBLE_DEVICES=0` before imports.
126
+ 3. **torchcodec error:** Downgrade torchaudio to 2.5.1.
127
+
128
+ ## License
129
+
130
+ Apache 2.0
131
+
132
+ ## Acknowledgments
133
+
134
+ - [Coqui TTS](https://github.com/coqui-ai/TTS)
135
+ - [XTTS v2](https://huggingface.co/coqui/XTTS-v2)