You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Configuration Parsing Warning:Invalid JSON for config file config.json

XTTS Luganda Fine-tuned Model

This is a fine-tuned XTTS model for the Luganda language, trained using the Common Voice Luganda dataset.

Model Details

Base Model: Coqui XTTS v2
Language: Luganda (lg)
Dataset: Common Voice Luganda
Fine-tuning Date: May 2024

How to use

This model can be loaded and used with the TTS library, similar to other XTTS models. You will need to provide a speaker reference audio for inference.

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import torch

# Load config
config = XttsConfig()
config.load_json("config.json")

# Init model
model = Xtts.init_from_config(config)
model.load_checkpoint(
    config,
    checkpoint_path="best_model.pth",
    vocab_path="vocab.json",
    eval=True,
    use_deepspeed=False,
)

# Move model to GPU if available
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(DEVICE)

# Generate speaker latents from a reference audio
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(
    audio_path=["path/to/speaker_reference.wav"],
    gpt_cond_len=config.gpt_cond_len,
    max_ref_length=config.max_ref_len,
    sound_norm_refs=config.sound_norm_refs,
)

# Synthesize text
text = "Yasalawo kutandika kusuubula mwanyi."
output = model.inference(
    text=text,
    language='lg',
    gpt_cond_latent=gpt_cond_latent,
    speaker_embedding=speaker_embedding,
    temperature=0.75,
    top_p=0.85,
    top_k=50,
    repetition_penalty=5.0,
    enable_text_splitting=True,
)

# The synthesized audio is in output['wav']
# You can save it or play it.

Downloads last month: 69

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support