Configuration Parsing Warning:Invalid JSON for config file config.json
XTTS Luganda Fine-tuned Model
This is a fine-tuned XTTS model for the Luganda language, trained using the Common Voice Luganda dataset.
Model Details
- Base Model: Coqui XTTS v2
- Language: Luganda (lg)
- Dataset: Common Voice Luganda
- Fine-tuning Date: May 2024
How to use
This model can be loaded and used with the TTS library, similar to other XTTS models. You will need to provide a speaker reference audio for inference.
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import torch
# Load config
config = XttsConfig()
config.load_json("config.json")
# Init model
model = Xtts.init_from_config(config)
model.load_checkpoint(
config,
checkpoint_path="best_model.pth",
vocab_path="vocab.json",
eval=True,
use_deepspeed=False,
)
# Move model to GPU if available
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(DEVICE)
# Generate speaker latents from a reference audio
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(
audio_path=["path/to/speaker_reference.wav"],
gpt_cond_len=config.gpt_cond_len,
max_ref_length=config.max_ref_len,
sound_norm_refs=config.sound_norm_refs,
)
# Synthesize text
text = "Yasalawo kutandika kusuubula mwanyi."
output = model.inference(
text=text,
language='lg',
gpt_cond_latent=gpt_cond_latent,
speaker_embedding=speaker_embedding,
temperature=0.75,
top_p=0.85,
top_k=50,
repetition_penalty=5.0,
enable_text_splitting=True,
)
# The synthesized audio is in output['wav']
# You can save it or play it.
- Downloads last month
- 52
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support