Configuration Parsing Warning:Invalid JSON for config file config.json

XTTS Luganda Fine-tuned Model

This is a fine-tuned XTTS model for the Luganda language, trained using the Common Voice Luganda dataset.

Model Details

  • Base Model: Coqui XTTS v2
  • Language: Luganda (lg)
  • Dataset: Common Voice Luganda
  • Fine-tuning Date: May 2024

How to use

This model can be loaded and used with the TTS library, similar to other XTTS models. You will need to provide a speaker reference audio for inference.

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import torch

# Load config
config = XttsConfig()
config.load_json("config.json")

# Init model
model = Xtts.init_from_config(config)
model.load_checkpoint(
    config,
    checkpoint_path="best_model.pth",
    vocab_path="vocab.json",
    eval=True,
    use_deepspeed=False,
)

# Move model to GPU if available
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(DEVICE)

# Generate speaker latents from a reference audio
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(
    audio_path=["path/to/speaker_reference.wav"],
    gpt_cond_len=config.gpt_cond_len,
    max_ref_length=config.max_ref_len,
    sound_norm_refs=config.sound_norm_refs,
)

# Synthesize text
text = "Yasalawo kutandika kusuubula mwanyi."
output = model.inference(
    text=text,
    language='lg',
    gpt_cond_latent=gpt_cond_latent,
    speaker_embedding=speaker_embedding,
    temperature=0.75,
    top_p=0.85,
    top_k=50,
    repetition_penalty=5.0,
    enable_text_splitting=True,
)

# The synthesized audio is in output['wav']
# You can save it or play it.
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support