StyleTTS2-Twi (Demo Version)
This repository contains a demonstration checkpoint of StyleTTS2 fine-tuned for the Asante-Twi language. This version is intended to showcase rapid adaptation to the Twi language and has been optimized for deployment by stripping optimizer states and converting network weights to float16.
Quick Demo
🚀 Try it out on Hugging Face Spaces: GhanaTTS Demo
Model Details
- Architecture: StyleTTS2
- Language: Asante-Twi
- Status: Demo/Proof-of-Concept
- Dataset: ghananlpcommunity/asante-twi-bible-speech-text
- Training Strategy: This model was fine-tuned on a targeted subset of the data (short duration samples) to demonstrate rapid adaptation.
- Training Duration: 7 Epochs.
- Phonemizer: Custom Twi-G2P (using
twi-g2p). - Training Notebook: Fine-tuning StyleTTS2 for Asante-Twi
Installation
To use this model locally, you need to clone the original StyleTTS2 repository and install the necessary dependencies, including the Twi-specific phonemizer.
# 1. Clone StyleTTS2 repository
git clone [https://github.com/yl4579/StyleTTS2.git](https://github.com/yl4579/StyleTTS2.git)
cd StyleTTS2
# 2. Install core requirements
pip install munch torch torchaudio pydub pyyaml librosa nltk matplotlib \
accelerate transformers phonemizer einops einops-exts tqdm \
git+[https://github.com/resemble-ai/monotonic_align.git](https://github.com/resemble-ai/monotonic_align.git)
# 3. Install Twi-specific tools & datasets
pip install datasets git+[https://github.com/Ghana-NLP/twi-g2p.git](https://github.com/Ghana-NLP/twi-g2p.git)
# 4. Install system dependencies (for espeak-ng fallback)
sudo apt-get install -y espeak-ng
Inference Code
Save this script as generate_twi.py inside the cloned StyleTTS2 folder. Ensure you have the checkpoint and config_ft.yml in the same directory.
import torch
import yaml
from twi_g2p.g2p import G2P
from models import build_model
from utils import *
# Load device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# 1. Load Config
config = yaml.safe_load(open("config_ft.yml"))
# 2. Build Model
# Note: Ensure the 'models.py' from the StyleTTS2 repo is present
model = build_model(recursive_munch(config['model_params']), None)
params = torch.load("epoch_2nd_00024.pth", map_location=device)
# Load the weights into the model dictionary
for key in model:
if key in params['net']:
model[key].load_state_dict(params['net'][key])
_ = [model[key].eval().to(device) for key in model]
# 3. Setup Phonemizer
g2p = G2P()
def synthesize(text, reference_wav_path):
# Convert Twi text to phonemes
phones = g2p.convert(text)
print(f"Synthesizing: {text}")
print(f"Phonemes: {phones}")
# Note: You will need the 'inference' helper function
# from the original StyleTTS2 notebook/script to generate the audio.
# wav = inference(text, ref_s, alpha=0.3, beta=0.7, diffusion_steps=5)
return phones
Credits
Developed by Mich-Seth Owusu for the Ghana NLP Community. Special thanks to the authors of StyleTTS2 for the base architecture.
- Organization: Ghana NLP