Configuration Parsing Warning:Invalid JSON for config file config.json
XTTS v2 - Dhivehi (Thaana)
Fine-tuned XTTS v2.0 for Dhivehi (Maldivian, Thaana script) text-to-speech with zero-shot voice cloning.
Model Details
- Base model: XTTS v2.0 (Coqui)
- Language: Dhivehi (dv) - Thaana script
- Architecture: GPT-2 + DVAE + HiFiGAN vocoder
- Audio: 24kHz output
- Training step: 95366
Training Data
59,000 samples (75+ hours) from multiple Dhivehi speech sources:
- Serialtechlab/dhivehi-javaabu-speech-parquet - news/article narration
- Serialtechlab/dv-presidential-speech - presidential addresses
- Serialtechlab/dhivehi-tts-female-01 - female speaker
- alakxender/dv-audio-syn-lg - synthetic speech (subset)
Usage
Install
pip install coqui-tts
Inference
import torch, torchaudio
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
# Download all files from this repo into a local directory
config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(
config,
checkpoint_path="model.pth",
vocab_path="vocab.json",
eval=True,
strict=False,
)
model.cuda()
# Get speaker embedding from a reference WAV (5-15 sec of clean speech)
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(
audio_path=["reference.wav"],
gpt_cond_len=24,
gpt_cond_chunk_len=4,
)
# Generate speech
out = model.inference(
text="\u0784\u07a8\u0790\u07b0\u0789\u07a8\ufdf2 \u0783\u07a6\u0781\u07aa\u0789\u07a7\u0782\u07a8 \u0783\u07a6\u0781\u07a9\u0789\u07a8",
language="dv",
gpt_cond_latent=gpt_cond_latent,
speaker_embedding=speaker_embedding,
temperature=0.7,
)
wav = torch.tensor(out["wav"]).unsqueeze(0)
torchaudio.save("output.wav", wav, 24000)
Files
model.pth- Fine-tuned GPT checkpointconfig.json- Model configurationvocab.json- Extended BPE vocabulary (base XTTS + Thaana characters)dvae.pth- Discrete VAE (from base XTTS v2.0)mel_stats.pth- Mel spectrogram normalization stats (from base XTTS v2.0)
Limitations
- Voice cloning quality depends on the reference audio (clean, 5-15 seconds recommended)
- Text longer than ~300 characters may be truncated
- Some rare Dhivehi words may be mispronounced
- Model is still being actively trained - newer checkpoints may be uploaded
License
This model inherits the Coqui Public Model License from the base XTTS v2.0 model.
- Downloads last month
- 187
Model tree for Serialtechlab/xtts-v2-dhivehi
Base model
coqui/XTTS-v2