reuben256
/

xtts-cv

Model card Files Files and versions

xtts-cv / README.md

reuben256's picture

Add model card (README.md)

ad74c5c verified 29 days ago

|

history blame contribute delete

2.46 kB


	---
	widget:
	- src: https://huggingface.co/spaces/abidlabs/xtts-v2/raw/main/app.py
	example_title: Text-to-Speech
	inputs:
	- interface: text
	label: Text Input
	value: Ekikokyo kino nakyo kyetaagisa omuntu okutunula n'alengera ekintu ekyakula nga kyefaanaanyirizaako ennyumba."
	- interface: audio
	label: Speaker Reference
	value: https://huggingface.co/coqui/XTTS-v2/resolve/main/female.wav
	- interface: slider
	label: Temperature
	value: 0.75
	minimum: 0
	maximum: 1
	step: 0.05
	- interface: slider
	label: Top-P
	value: 0.85
	minimum: 0
	maximum: 1
	step: 0.05
	- interface: slider
	label: Top-K
	value: 50
	minimum: 1
	maximum: 100
	step: 1
	- interface: slider
	label: Repetition Penalty
	value: 5
	minimum: 1
	maximum: 10
	step: 0.1
	---
	# XTTS Luganda Fine-tuned Model

	This is a fine-tuned XTTS model for the Luganda language, trained using the Common Voice Luganda dataset.

	## Model Details

	- Base Model: Coqui XTTS v2
	- Language: Luganda (lg)
	- Dataset: Common Voice Luganda
	- Fine-tuning Date: May 2024

	## How to use

	This model can be loaded and used with the `TTS` library, similar to other XTTS models. You will need to provide a speaker reference audio for inference.

	```python
	from TTS.tts.configs.xtts_config import XttsConfig
	from TTS.tts.models.xtts import Xtts
	import torch

	# Load config
	config = XttsConfig()
	config.load_json("config.json")

	# Init model
	model = Xtts.init_from_config(config)
	model.load_checkpoint(
	config,
	checkpoint_path="best_model.pth",
	vocab_path="vocab.json",
	eval=True,
	use_deepspeed=False,
	)

	# Move model to GPU if available
	DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
	model.to(DEVICE)

	# Generate speaker latents from a reference audio
	gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(
	audio_path=["path/to/speaker_reference.wav"],
	gpt_cond_len=config.gpt_cond_len,
	max_ref_length=config.max_ref_len,
	sound_norm_refs=config.sound_norm_refs,
	)

	# Synthesize text
	text = "Yasalawo kutandika kusuubula mwanyi."
	output = model.inference(
	text=text,
	language='lg',
	gpt_cond_latent=gpt_cond_latent,
	speaker_embedding=speaker_embedding,
	temperature=0.75,
	top_p=0.85,
	top_k=50,
	repetition_penalty=5.0,
	enable_text_splitting=True,
	)

	# The synthesized audio is in output['wav']
	# You can save it or play it.
	```