metadata
widget:
- src: https://huggingface.co/spaces/abidlabs/xtts-v2/raw/main/app.py
example_title: Text-to-Speech
inputs:
- interface: text
label: Text Input
value: >-
Ekikokyo kino nakyo kyetaagisa omuntu okutunula n'alengera ekintu
ekyakula nga kyefaanaanyirizaako ennyumba."
- interface: audio
label: Speaker Reference
value: https://huggingface.co/coqui/XTTS-v2/resolve/main/female.wav
- interface: slider
label: Temperature
value: 0.75
minimum: 0
maximum: 1
step: 0.05
- interface: slider
label: Top-P
value: 0.85
minimum: 0
maximum: 1
step: 0.05
- interface: slider
label: Top-K
value: 50
minimum: 1
maximum: 100
step: 1
- interface: slider
label: Repetition Penalty
value: 5
minimum: 1
maximum: 10
step: 0.1
XTTS Luganda Fine-tuned Model
This is a fine-tuned XTTS model for the Luganda language, trained using the Common Voice Luganda dataset.
Model Details
- Base Model: Coqui XTTS v2
- Language: Luganda (lg)
- Dataset: Common Voice Luganda
- Fine-tuning Date: May 2024
How to use
This model can be loaded and used with the TTS library, similar to other XTTS models. You will need to provide a speaker reference audio for inference.
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import torch
# Load config
config = XttsConfig()
config.load_json("config.json")
# Init model
model = Xtts.init_from_config(config)
model.load_checkpoint(
config,
checkpoint_path="best_model.pth",
vocab_path="vocab.json",
eval=True,
use_deepspeed=False,
)
# Move model to GPU if available
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(DEVICE)
# Generate speaker latents from a reference audio
gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(
audio_path=["path/to/speaker_reference.wav"],
gpt_cond_len=config.gpt_cond_len,
max_ref_length=config.max_ref_len,
sound_norm_refs=config.sound_norm_refs,
)
# Synthesize text
text = "Yasalawo kutandika kusuubula mwanyi."
output = model.inference(
text=text,
language='lg',
gpt_cond_latent=gpt_cond_latent,
speaker_embedding=speaker_embedding,
temperature=0.75,
top_p=0.85,
top_k=50,
repetition_penalty=5.0,
enable_text_splitting=True,
)
# The synthesized audio is in output['wav']
# You can save it or play it.