XTTS v2 Fine-tuned Voice Model
This is a fine-tuned XTTS v2 model trained for voice cloning.
Model Details
- Base Model: XTTS v2 (Coqui TTS)
- Training: 400 epochs
- Loss Improvement: 57% reduction (2.863 → 1.231)
- Languages: English & French
- Sample Rate: 24kHz
Files
best_model.pth- Fine-tuned GPT model weightsconfig.json- Model configurationvocab.json- Tokenizer vocabularydvae.pth- Discrete VAE for audio encodingmel_stats.pth- Mel-spectrogram normalization statsreference.wav- Voice reference sample
Usage
from TTS.api import TTS
# Load the model
tts = TTS(model_path="path/to/model/folder", config_path="path/to/config.json")
# Generate speech
tts.tts_to_file(
text="Your text here",
file_path="output.wav",
speaker_wav="reference.wav",
language="en"
)
Requirements
TTS>=0.22.0
torch>=2.0.0
torchaudio
Training Details
- Dataset: Custom voice recordings
- Training Duration: ~50 hours total
- Batch Size: 2
- Gradient Accumulation: 126 steps
- Optimizer: AdamW
Limitations
- Optimized for the specific voice in the training data
- Best performance on texts similar to training distribution
- May have reduced quality on very long texts
License
Apache 2.0
- Downloads last month
- -